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PREFACE 


These notes build upon a course I taught at the University of Maryland during the fall 
of 1983. My great thanks go to Martino Bardi, who took careful notes, saved them all 
these years and recently mailed them to me. Faye Yeager typed up his notes into a first 
draft of these lectures as they now appear. 

I have radically modified much of the notation (to be consistent with my other writ- 
ings), updated the references, added several new examples, and provided a proof of the 
Pontryagin Maximum Principle. As this is a course for undergraduates, I have dispensed 
in certain proofs with various measurability and continuity issues, and as compensation 
have added various critiques as to the lack of total rigor. 

Scott Armstrong read over the notes and suggested many improvements: thanks. 


This current version of the notes is not yet complete, but meets I think the usual high 
standards for material posted on the internet. Please email me at evansQmath.berkeley.edu 


with any corrections or comments. 


CHAPTER 1: INTRODUCTION 
1.1. The basic problem 
1.2. Some examples 
1.3. A geometric solution 
1.4. Overview 


1.1 THE BASIC PROBLEM. 


DYNAMICS. We open our discussion by considering an ordinary differential equation 
(ODE) having the form 


ee { 20 


We are here given the initial point z? € R” and the function f : R” — R”. The unknown 


0 


f (x(t)) (t > 0) 











is the curve x : [0, o0) — R”, which we interpret as the dynamical evolution of the state 


of some “system”. 


CONTROLLED DYNAMICS. We generalize a bit and suppose now that f depends 
also upon some “control” parameters belonging to a set A C R”; so that f : R” x A —> R”. 











Then if we select some value a € A and consider the corresponding dynamics: 
x(t) = £(x(¢), a) (t > 0) 
x(0) = x°, 
we obtain the evolution of our system when the parameter is constantly set to the value a. 


The next possibility is that we change the value of the parameter as the system evolves. 
For instance, suppose we define the function æ : [0, o0) — A this way: 


ai 0 < t < ti 
a(t) = ag ti <t<tə 
a3 t2<t< t3 etc. 
for times 0 < tı < to < t3... and parameter values a1, a2,a3,::- € A; and we then solve 
the dynamical equation 
x(t) = f(x(t),a(t)) (t > 0) 
(1.2) i 
x(0) = x”. 
The picture illustrates the resulting evolution. The point is that the system may behave 
quite differently as we change the control parameters. 


More generally, we call a function @ : [0,o0) — A a control. Corresponding to each 
control, we consider the ODE 


(ODE) l x(t) = : (x(t), a(t)) (t > 0) 


aw. ————> 






trajectory of ODE 


CONTROLLED DYNAMICS 


and regard the trajectory x(-) as the corresponding response of the system. 


NOTATION. (i) We will write 


f(x, a) 
f(x,a) = : 
f” (x, a) 
to display the components of f, and similarly put 
x(t) 
x=]: 
z” (t) 


We will therefore write vectors as columns in these notes and use boldface for vector-valued 
functions, the components of which have superscripts. 
(ii) We also introduce 


A = {a : |0,00) — A | a(-) measurable} 


to denote the collection of all admissible controls, where 


Note very carefully that our solution x(-) of (ODE) depends upon a(-) and the initial 
condition. Consequently our notation would be more precise, but more complicated, if we 
were to write 

x(-) = x(-,a(-),2°), 


displaying the dependence of the response x(-) upon the control and the initial value. 














PAYOFFS. Our overall task will be to determine what is the “best” control for our 
system. For this we need to specify a specific payoff (or reward) criterion. Let us define 
the payoff functional 


T 
(P) Pla) = f rx), a(t) dt + a(x(T)), 











where x(-) solves (ODE) for the control a(-). Here r : R” x A — R and g : R” — R are 
given, ane we call r the running payoff and g the terminal payoff. The terminal time T > 0 





is given as well. 


THE BASIC PROBLEM. Our aim is to find a control a*(-), which maximizes the 
payoff. In other words, we want 


Plo*(-)] = Pla()] 
for all controls a(-) € A. Such a control a*(-) is called optimal. 


This task presents us with these mathematical issues: 


(i) Does an optimal control exist? 
(ii) How can we characterize an optimal control mathematically? 


(iii) How can we construct an optimal control? 


These turn out to be sometimes subtle problems, as the following collection of examples 
illustrates. 


1.2 EXAMPLES 


EXAMPLE 1: CONTROL OF PRODUCTION AND CONSUMPTION. 
Suppose we own, say, a factory whose output we can control. Let us begin to construct 
a mathematical model by setting 


x(t) = amount of output produced at time t > 0. 


We suppose that we consume some fraction of our output at each time, and likewise can 
reinvest the remaining fraction. Let us denote 


a(t) = fraction of output reinvested at time t > 0. 
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This will be our control, and is subject to the obvious constraint that 
0<a(t) <1 for each timet > 0. 


Given such a control, the corresponding dynamics are provided by the ODE 


l i(t) = ka(t)x(t) 
x(0) = x°. 


the constant k > 0 modelling the growth rate of our reinvestment. Let us take as a payoff 


functional a 
Pla(-)] =| (1 — a(t))a(t) dt. 


The meaning is that we want to maximize our total consumption of the output, our 
consumption at a given time t being (1 — a(t))x(t). This model fits into our general 


framework for n = m = 1, once we put 


A= [0,1], f(z,a) = kaz, r(z,a) = (1 — a)xz, g = 0. 


a*=1 








O 
ct 
ae cee 
= 


A BANG-BANG CONTROL 


As we will see later in §4.4.2, an optimal control a*(-) is given by 


ath) = 


1 if 0<t<t* 
0 iw cesar 


for an appropriate switching time 0 < t* < T. In other words, we should reinvest all 
the output (and therefore consume nothing) up until time t*, and afterwards, we should 
consume everything (and therefore reinvest nothing). The switchover time t* will have to 











be determined. We call a*(-) a bang-bang control. 





EXAMPLE 2: REPRODUCTIVE STATEGIES IN SOCIAL INSECTS 
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The next example is from Chapter 2 of the book Caste and Ecology in Social Insects, 


by G. Oster and E. O. Wilson [O-W]. We attempt to model how social insects, say a 
population of bees, determine the makeup of their society. 


Let us write T for the length of the season, and introduce the variables 


w(t) = number of workers at time t 
q(t) = number of queens 
a(t) = fraction of colony effort devoted to increasing work force 


The control œ is constrained by our requiring that 


0<a(t) <1. 


We continue to model by introducing dynamics for the numbers of workers and the 
number of queens. The worker population evolves according to 


{ w(t) = —pw(t) + bs(t)a(t)w(t) 
w(0) = w?. 


Here u is a given constant (a death rate), b is another constant, and s(t) is the known rate 
at which each worker contributes to the bee economy. 
We suppose also that the population of queens changes according to 


{ q(t) = —vq(t) + c(1 — a(t))s(t) w(t) 
q(0) = 4°, 


for constants v and c. 


Our goal, or rather the bees’, is to maximize the number of queens at time T: 


So in terms of our general notation, we have x(t) 


(w(t), q(t))* and 2° = (w°, q°)T. We 
are taking the running payoff to be r = 0, and the terminal payoff g(w,q) = q. 


The answer will again turn out to be a bang-bang control, as we will explain later. 
EXAMPLE 3: A PENDULUM. 














We look next at a hanging pendulum, for which 


O(t) = angle at time t. 


If there is no external force, then we have the equation of motion 


ÖC) + AO(t) + w2A(t) = 0 
ao = 01, 0(0) = 62; 
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the solution of which is a damped oscillation, provided A > 0. 
Now let a(-) denote an applied torque, subject to the physical constraint that 


la| <1. 


Our dynamics now become 


{ A(t) + AO(t) + w?A(t) = a(t) 
0(0) = 61, (0) = b2. 


Define a(t) = 0(t), x2(t) = O(t), and x(t) = (x1(t), v2(t)). Then we can write the evolution 


Ber heey 
<r & Š (a) 7 ee uta ie) TARA 


for 
T = T(a(-)) = first time that x(r) =0 (that is, 0(r) = 6(r) = 0.) 


We want to maximize P[|-], meaning that we want to minimize the time it takes to bring 
the pendulum to rest. 

Observe that this problem does not quite fall within the general framework described 
in §1.1, since the terminal time is not fixed, but rather depends upon the control. This is 











called a fixed endpoint, free time problem. 





EXAMPLE 4: A MOON LANDER 

This model asks us to bring a spacecraft to a soft landing on the lunar surface, using 
the least amount of fuel. 

We introduce the notation 


h(t) = height at time t 


(t) 
v(t) = velocity = h(t) 
m(t) = mass of spacecraft (changing as fuel is burned) 
a(t) = thrust at time t 


We assume that 
0< a(t) <1, 


and Newton’s law tells us that 


mh = -gm +Q, 


height = h(t) 


moon’s surface 


A SPACECRAFT LANDING ON THE MOON 


the right hand side being the difference of the gravitational force and the thrust of the 
rocket. This system is modeled by the ODE 


o(t) = —g + a) 
h(t) = v(t) 
i(t) = —ka(t). 


We summarize these equations in the form 


for x(t) = (v(t), h(t), m(t)). 
We want to minimize the amount of fuel used up, that is, to maximize the amount 
remaining once we have landed. Thus 


Pla(-)] = m(7), 


where 
T denotes the first time that A(T) = v(T) = 0. 


This is a variable endpoint problem, since the final time is not given in advance. We have 
also the extra constraints 














EXAMPLE 5: ROCKET RAILROAD CAR. 
Imagine a railroad car powered by rocket engines on each side. We introduce the 


variables a 
q(t) = position at time t 
u(t) = q(t) = velocity at time t 
a(t) = thrust from rockets, 
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neal rocket engines 


A ROCKET CAR ON A TRAIN TRACK 


where 
—-l1<a(t) <1, 


the sign depending upon which engine is firing. 
We want to figure out how to fire the rockets, so as to arrive at the origin 0 with zero 
velocity in a minimum amount of time. Assuming the car has mass m, the law of motion 


1S 


We rewrite by setting x(t) = (q(t), v(t)). Then 


x(t) = (3 a) x(t) + (?)a(t) 


x(0) = x? = (qo, v0)". 


Since our goal is to steer to the origin (0,0) in minimum time, we take 


Pla(-)] = -f 1 dt = —r, 


for 
T = first time that q(T) = v(T) = 0. 


1.3 A GEOMETRIC SOLUTION. 


To illustrate how actually to solve a control problem, in this last section we introduce 
some ad hoc calculus and geometry methods for the rocket car problem, Example 5 above. 


First of all, let us guess that to find an optimal solution we will need only consider the 
cases a = 1 or a = —1. In other words, we will focus our attention only upon those controls 
for which at each moment of time either the left or the right rocket engine is fired at full 
power. (We will later see in Chapter 2 some theoretical justification for looking only at 


such controls.) 


CASE 1: Suppose first that a = 1 for some time interval, during which 


q=u 
eile 


Then 
vů = q, 


and so i 
ns q. 


e E a 





Consequently 


v’ (t) = 2q(t) + (v*(to) — 24(to)) - 
SS 


(1.1) 
b 
In other words, so long as the control is set for a = 1, the trajectory stays on the curve 


v? = 2q + b for some constant b. 





curves v?=2q + b 


CASE 2: Suppose now a = —1 on some time interval. Then as above 
q=v 
a 
and hence 
=å. 
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curves v@=-2q + C 


Let tı belong to an interval where a = —1 and integrate: 


(1.2) v? (t) = —2q(t) + (2q(t1) — v? (t1)). 
SEES 


Consequently, as long as the control is set for a = —1, the trajectory stays on the curve 


v? = —?2q + c for some constant c. 


GEOMETRIC INTERPRETATION. Formula (1.1) says if a = 1, then (q(t), v(t)) 
lies on a parabola of the form 
v? = 2q +b. 


Similarly, (1.2) says if a = —1, then (q(t), v(t)) lies on a parabola 
v= —2q +c. 


Now we can design an optimal control a*(-), which causes the trajectory to jump between 
the families of right- and left—pointing parabolas, as drawn. Say we start at the black dot, 
and wish to steer to the origin. This we accomplish by first setting the control to the value 
a = —1, causing us to move down along the second family of parabolas. We then switch 
to the control a = 1, and thereupon move to a parabola from the first family, along which 
we move up and to the left, ending up at the origin. See the picture. 


1.4 OVERVIEW. 
Here are the topics we will cover in this course: 


e Chapter 2: Controllability, bang-bang principle. 
12 





HOw TO GET TO THE ORIGIN IN MINIMAL TIME 


In this chapter, we introduce the simplest class of dynamics, those linear in both the 
state x(-) and the control a(-), and derive algebraic conditions ensuring that the system 
can be steered into a given terminal state. We introduce as well some abstract theorems 
from functional analysis and employ them to prove the existence of so-called “bang-bang” 
optimal controls. 


e Chapter 3: Time-optimal control. 

In Chapter 3 we continue to study linear control problems, and turn our attention to 
finding optimal controls that steer our system into a given state as quickly as possible. We 
introduce a maximization principle useful for characterizing an optimal control, and will 


later recognize this as a first instance of the Pontryagin Maximum Principle. 


e Chapter 4: Pontryagin Maximum Principle. 

Chapter 4’s discussion of the Pontryagin Maximum Principle and its variants is at 
the heart of these notes. We postpone proof of this important insight to the Appendix, 
preferring instead to illustrate its usefulness with many examples with nonlinear dynamics. 


e Chapter 5: Dynamic programming. 

Dynamic programming provides an alternative approach to designing optimal controls, 
assuming we can solve a nonlinear partial differential equation, called the Hamilton-Jacobi- 
Bellman equation. This chapter explains the basic theory, works out some examples, and 
discusses connections with the Pontryagin Maximum Principle. 


e Chapter 6: Game theory. 
We discuss briefly two-person, zero-sum differential games and how dynamic program- 
ming and maximum principle methods apply. 


e Chapter 7: Introduction to stochastic control theory. 
13 


This chapter provides a very brief introduction to the control of stochastic differential 
equations by dynamic programming techniques. The Itô stochastic calculus tells us how 
the random effects modify the corresponding Hamilton-Jacobi-Bellman equation. 


e Appendix: Proof of the Pontryagin Maximum Principle. 
We provide here the proof of this important assertion, discussing clearly the key ideas. 
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CHAPTER 2: CONTROLLABILITY, BANG-BANG PRINCIPLE 
2.1 Definitions 
2.2 Quick review of linear ODE 
2.3 Controllability of linear equations 
2.4 Observability 
2.5 Bang-bang principle 
2.6 References 


2.1 DEFINITIONS. 
We firstly recall from Chapter 1 the basic form of our controlled ODE: 


(ODE) poe 




















Here x° € R”, f : R” x A — R”, a: (0,00) — A is the control, and x : [0,00) — R” is the 
response of the system. 





This chapter addresses the following basic 


CONTROLLABILITY QUESTION: Given the initial point z? and a “target” set 
S C R”, does there exist a control steering the system to S in finite time? 


For the time being we will therefore not introduce any payoff criterion that would 
characterize an “optimal” control, but instead will focus on the question as to whether 
or not there exist controls that steer the system to a given goal. In this chapter we will 
mostly consider the problem of driving the system to the origin S = {0}. 


DEFINITION. We define the reachable set for time t to be 


C(t) = set of initial points x° for which there exists a 
control such that x(t) = 0, 


and the overall reachable set 


C = set of initial points x° for which there exists a 
control such that x(t) = 0 for some finite time t. 


Note that 


C='|)C@). 


t>0 


Hereafter, let M”*™ denote the set of all n x m matrices. We assume for the rest of 
this and the next chapter that our ODE is linear in both the state x(-) and the control 
a(-), and consequently has the form 


(ODE) l x(t) = Mx(t)+ Na(t)  (t>0) 
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where M € M”*” and N € M”*"™. We assume the set A of control parameters is a cube 


in R”: 








A=[-1,1]" = {a € RB” | |a;|<1, ¢=1,...,m}. 











2.2 QUICK REVIEW OF LINEAR ODE. 


This section records for later reference some basic facts about linear systems of ordinary 
differential equations. 


DEFINITION. Let X(-) : R — M”*” be the unique solution of the matrix ODE 














{ X(t) = MX(t) (t€R) 


We call X(-) a fundamental solution, and sometimes write 


X(t) = é" = 
k=0 


tE MF 
ki’ 





the last formula being the definition of the exponential et. Observe that 


X(t) = X(—t). 


THEOREM 2.1 (SOLVING LINEAR SYSTEMS OF ODE). 
(i) The unique solution of the homogeneous system of ODE 


l x(t) = Mx(t) 
x(0) = a? 


1s 


(ii) The unique solution of the nonhomogeneous system 


l x(t) = Mx(t) + f(t) 
x(0) = x°. 


is 
t 
x(t) = X (t)z? + x(t) | X~'(s)f(s) ds. 
0 
This expression is the variation of parameters formula. 


2.3 CONTROLLABILITY OF LINEAR EQUATIONS. 
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According to the variation of parameters formula, the solution of (ODE) for a given 
control a(-) is 


x(t) = X(t)2° + X(t) f X-1(s)Nas) ds, 


where X(t) = eM. Furthermore, observe that 


x? € C(t) 
if and only if 
(2.1) there exists a control a(-) € A such that x(t) = 0 
if and only if 
t 
(2.2) 0 = X (t)z? + x(t) | X~'(s)Na(s)ds for some control a(-) € A 
0 
if and only if 
t 
(2.3) g= -j X~'(s)Na(s)ds for some control a(-) € A. 
0 


We make use of these formulas to study the reachable set: 


THEOREM 2.2 (STRUCTURE OF REACHABLE SET). 
(i) The reachable set C is symmetric and convex. 
(ii) Also, if x° € C(t), then x°? € C(t) for all times t > t. 


DEFINITIONS. 

(i) We say a set S is symmetric if x € S implies —x € S. 

(ii) The set S is convex if x,ĉ € S and 0 < À < 1 imply Az + (1-A)ZES. 
Proof. 1. (Symmetry) Let t > 0 and x° € C(t). Then x° = — is X~1(s)Na(s) ds for some 
admissible control œ € A. Therefore —x° = — ves X~1!(s)N(—a(s)) ds, and —a@ € A since 
the set A is symmetric. Therefore —x° € C(t), and so each set C(t) symmetric. It follows 
that C is symmetric. 


2. (Convexity) Take 2°,#° € C; so that z? € C(t), 2° € C(t) for appropriate times 
t,f>0. Assume t < t. Then 
z? = — h X~1(s)Na(s)ds for some control a € A, 
2! = = I X~1(s)Na&(s)ds for some control & € A. 


Define a new control 
7 { a(s) if0<s<t 
à(s) := ; 
0 ifs >t. 
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Then g 
i 

x? = -j X~'(s)N&(s) ds, 
0 


and hence z? € C(t). Now let 0 < A < 1, and observe 
t 
Ax’ + (1 — à)? = -j X- Hs) N(A&(s) + (1 — A)â(s)) ds. 
0 


Therefore Az? + (1 — \)2#° € C(Ê CC. 











3. Assertion (ii) follows from the foregoing if we take ¢ = f. 





A SIMPLE EXAMPLE. Let n = 2 and m = 1, A = [-1,1], and write x(t) = 
(x! (t), x?(t))?. Suppose 
2 =0 
fi = a(t). 


This is a system of the form x = Mx + Na, for 


u=(> o) ¥=()) 


Clearly C = {(%1, £2) | x1 = 0}, the x2—-axis. 














We next wish to establish some general algebraic conditions ensuring that C contains a 
neighborhood of the origin. 


DEFINITION. The controllability matrix is 


G=G(M,N):=[N,MN,M2N,...,M""N). 
SŘ 


nx(mn) matrix 


THEOREM 2.3 (CONTROLLABILITY MATRIX). We have 
rankG =n 


if and only if 
0EC®. 


NOTATION. We write C° for the interior of the set C. Remember that 
rank of G = number of linearly independent rows of G 


= number of linearly independent columns of G. 
18 











Clearly rankG < n. 





Proof. 1. Suppose first that rank G < n. This means that the linear span of the columns 





of G has dimension less than or equal to n — 1. Thus there exists a vector b € R”, b Æ 0, 


orthogonal to each column of G. This implies 


b"G=0 
and so 
bN =b" MN =- =b" M" oN =0. 
2. We claim next that in fact 
(2.4) bT M*N =0 for all positive integers k. 


To confirm this, recall that 
p(A) := det (AI — M) 


is the characteristic polynomial of M. The Cayley-Hamilton Theorem states that 
p(M) =0. 


So if we write 


p(A) =A" + Bn-1A"* +++ + BA! + Bo, 


then 
p(M) = M” + By_1M"" 1 +.--+ 6M + Bol = 0. 
Therefore 
M” = -bn M"! — By-2M" — - -- — 61M — Bol, 
and so 


bT MN =b" (—bnr- M! —...)N =0. 


Similarly, 6? M"+!N = bT (—B,_1M” —...)N = 0, etc. The claim (2.4) is proved. 
Now notice that 





OO > k MEN CO EA 
BTX-H(s)N = bTeMN =T Y l s) Zso s) 
k=0 ` k=0 
according to (2.4). 


3. Assume next that x° € C(t). This is equivalent to having 


t 
g= -j X~'(s)Na(s) ds for some control a(-) € A. 
0 
19 


Then 


t 
b- 29 = -j b?X~1(s)Na(s) ds = 0. 
0 


This says that b is orthogonal x°. In other words, C must lie in the hyperplane orthogonal 
to b #0. Consequently C° = Í. 


4. Conversely, assume 0 ¢ C°. Thus 0 ¢ C°(t) for all t > 0. Since C(t) is convex, there 
exists a supporting hyperplane to C(t) through 0. This means that there exists b 4 0 such 
that b- x° < 0 for all 2° € C(t). 

Choose any x° € C(t). Then 


for some control a, and therefore 
t 
C27 = -j bT X! (s)Na(s) ds. 
0 
Thus j 
i: b’X~1(s)Na(s)ds >0 forall controls a(-). 
0 
We assert that therefore 
(2.5) b'X—!(s)N =0, 
a proof of which follows as a lemma below. We rewrite (2.5) as 
(2.6) be N = 0: 
Let s = 0 to see that bT N = 0. Next differentiate (2.6) with respect to s, to find that 


b'(—M)e—"*“ N =0. 


For s = 0 this says 
bT MN =0. 


We repeatedly differentiate, to deduce 


bo’ M*N =0 forall k=0,1,..., 





and so b'G = 0. This implies rank G < n, since b Æ 0. 
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LEMMA 2.4 (INTEGRAL INEQUALITIES). Assume that 


(2.7) i b?X~1(s)Na(s) ds >0 


for all a(-) € A. Then 
b'X—!(s)N =0. 


Proof. Replacing aœ by —q@ in (2.7), we see that 
t 
i Aaeeeiai 
0 
for all a(-) € A. Define 
v(s) :=b7X71(s)N. 


If v £0, then v(so) Æ 0 for some sg. Then there exists an interval J such that sọ € J and 
v #0 on I. Now define a(-) € A this way: 





x 
where |v| := (X; |vi|?)?. Then 


= [ve Oe =| ral Fa [moles 


This implies the contradiction that v = 0 in I. 

















DEFINITION. We say the linear system (ODE) is controllable if C = R”. 


THEOREM 2.5 (CRITERION FOR CONTROLLABILITY). Let A be the cube 
[—1,1]” in R”. Suppose as well that rank G = n, and Reà < 0 for each eigenvalue À of 
the matrix M. 

Then the system (ODE) is controllable. 














Proof. Since rank G = n, Theorem 2.3 tells us that C contains some ball B centered at 0. 





Now take any x? € R” and consider the evolution 


f x(t) = Mx(t) 


x(0) = z9; 


in other words, take the control a(-) = 0. Since Re A < 0 for each eigenvalue À of M, then 


the origin is asymptotically stable. So there exists a time T such that x(T) € B. Thus 
21 


x(T) € B C C; and hence there exists a control a(-) € A steering x(T) into 0 in finite 


time. 














EXAMPLE. We once again consider the rocket railroad car, from $1.2, for which n = 2, 
m = 1, A=[-1,1], and 
x= G a x+ (i) 
0 0 1 


G= (N, MN) = 7 


Then 


1 0 


Therefore 
rankG =2=n. 


Also, the characteristic polynomial of the matrix M is 


isda any dat C a aN. 














Since the eigenvalues are both 0, we fail to satisfy the hypotheses of Theorem 2.5. 
This example motivates the following extension of the previous theorem: 


THEOREM 2.6 (IMPROVED CRITERION FOR CONTROLLABILITY). Assume 
rank G = n and Reà < 0 for each eigenvalue À of M. 
Then the system (ODE) is controllable. 


Proof. 1. If C Æ R”, then the convexity of C implies that there exist a vector b #0 and a 
real number u such that 


(2.8) b- 29 <u 


for all z? € C. Indeed, in the picture we see that b- (x? — z?) < 0; and this implies (2.8) 
for u := b- 2. 


22 


We will derive a contradiction. 











2. Given b Æ 0, u € R, our intention is to find z? € C so that (2.8) fails. Recall z? € C 
if and only if there exist a time t > 0 and a control a(-) € A such that 





t 
x? = -f X~'(s)Na(s) ds. 
0 
Then 


t 
b- x? = -j b?X~1(s)Na(s) ds 
0 


Define 
v(s) := bX! (s)N 


3. We assert that 
(2.9) v0. 


To see this, suppose instead that v = 0. Then k times differentiate the expression 
b’X~1(s)N with respect to s and set s = 0, to discover 


bT M*N =0 


for k = 0,1,2,.... This implies b is orthogonal to the columns of G, and so rankG < n. 
This is a contradiction to our hypothesis, and therefore (2.9) holds. 


4. Next, define a(-) this way: 


Then ; e 
ba? = f v(sja(syas= | v) ds 


We want to find a time t > 0 so that h |v(s)|ds > u. In fact, we assert that 
(2.10) 1 |v(s)|ds = +00. 
0 


To begin the proof of (2.10), introduce the function 


p(t) := A v(s) ds. 
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We will find an ODE ¢ satisfies. Take p(-) to be the characteristic polynomial of M. 
Then 


p(-5) ve =» (-5) tetany = 0" (p(-5) e)N =o Ue MN =o, 


0, 


since p(M) = 0, according to the Cayley-Hamilton Theorem. But since p (—<) v(t) 
it follows that 


-40(-5) 0 =»(-2) (-Ge) =p (-Z) vw =o 


Hence @ solves the (n + 1)” order ODE 


£, (-5) p(t) =0. 


We also know @(-) # 0. Let u1,...,Un+1 be the solutions of wp(—y) = 0. According to 
ODE theory, we can write 


p(t) = sum of terms of the form p;(t)eM*’ 


for appropriate polynomials p;(-). 


Furthermore, we see that fin41 = 0 and uk = —Ax, where \y,...,An are the eigenvalues 
of M. By assumption Re ju, > 0, for k =1,...,n. If fg |v(s)| ds < œœ, then 


POE L kaat weeds, 


that is, (t) —= 0 as t > co. This is a contradiction to the representation formula of 
p(t) = Np;(t)e“**, with Re u; > 0. Assertion (2.10) is proved. 


5. Consequently given any u, there exists t > 0 such that 


t 
b- x? -| |v(s)|ds > p, 
0 




















a contradiction to (2.8). Therefore C = R”. 








2.4 OBSERVABILITY 


We again consider the linear system of ODE 


(ODE) 


where M € M”*”. 
In this section we address the observability problem, modeled as follows. We suppose 
that we can observe 


(O) y(t) := Nx(t) (#20), 


for a given matrix N € M™*”. Consequently, y(t) € R™. The interesting situation is when 
m << n and we interpret y(-) as low-dimensional “observations” or “measurements” of 


the high-dimensional dynamics x(-). 


OBSERVABILITY QUESTION: Given the observations y(-), can we in principle re- 
construct x(-)? In particular, do observations of y(-) provide enough information for us to 
deduce the initial value x° for (ODE)? 


DEFINITION. The pair (ODE),(O) is called observable if the knowledge of y(-) on any 
time interval [0, t] allows us to compute 2°. 


More precisely, (ODE),(O) is observable if for all solutions x;(-), x2(-), Nx1(-) = Nx2(-) 
on a time interval [0, t] implies x1(0) = x2(0). 


TWO SIMPLE EXAMPLES. (i) If N = 0, then clearly the system is not observable. 
(ii) On the other hand, if m = n and N is invertible, then clearly x(t) = N~+ty(t) is 
observable. 











The interesting cases lie between these extremes. 





THEOREM 2.7 (OBSERVABILITY AND CONTROLLABILITY). The system 


x(t) = Mx(t) 
uy E = Nx(t) 


is observable if and only if the system 
(2.12) z(t) = M?z(t)+ N’ a(t), A=R™ 


is controllable, meaning that C = R”. 


INTERPRETATION. This theorem asserts that somehow “observability and controlla- 
bility are dual concepts” for linear systems. 


Proof. 1. Suppose (2.11) is not observable. Then there exist points x! 4 x? € R”, such 
that 
x1(t) = Mxı(t), xı (0) = xt 
{ X(t) = Mxə(t), X2(0) = r? 
but y(t) := Nxı (t) = Nx2(t) for all times t > 0. Let 
1 2 


x(t) := x1 (t) — x(t), 2° := x1- z’. 
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Then 


but 

Nx(t) =0 (t > 0) 
Now 

x(t) = Xe? =e" g? 
Thus 


Ne™ 7° = 0 0). 
Let t = 0, to find Nz? = 0. Then differentiate this expression k times in t and let t = 0, 
to discover as well that 
NM*x° =0 
for k = 0,1,2,.... Hence (2°)? (M*)™N? = 0, and hence (#°)?(MT)*N? = 0. This 
implies 
NE MENS My Ne =0. 
Since z? Æ 0, rank[N7,...,(M7)"-1N7] < n. Thus problem (2.12) is not controllable. 
Consequently, (2.12) controllable implies (2.11) is observable. 
2. Assume now (2.12) not controllable. Then rank[N7,...,(M7)"-!N7] < n, and 
consequently according to Theorem 2.3 there exists x? 4 0 such that 
INS ee MCE? PONS. 
That is, NM*x° = 0 for all k = 0,1,2,...,n—1. 
We want to show that y(t) = Nx(t) = 0, where 
l x(t) = Mx(t) 


x(0) = 2°. 
According to the Cayley—-Hamilton Theorem, we can write 
M” = —By_1M™ 1 —--- — Bol. 
for appropriate constants. Consequently NM"2x° = 0. Likewise, 
M"! = M(—Bn—1M"* — «+» — Bol) = —Bn-1M" — «+» — BoM; 
and so NM"+1y° = 0. Similarly, NM*x° = 0 for all k. 
Now 
SME a 


x(t) = X(t)2° = e'r? = -p T) 
k=0 ` 


and therefore Nx(t) = N X zo ae =0. 


We have shown that if (2.12) is not controllable, then (2.11) is not observable. 














2.5 BANG-BANG PRINCIPLE. 


For this section, we will again take A to be the cube [—1,1]” in R”. 
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DEFINITION. A control a(-) € A is called bang-bang if for each time t > 0 and each 


index i = 1,...,m, we have |a‘(t)| = 1, where 
ai (t) 
a(t) = : 
a™ (t) 


THEOREM 2.8 (BANG-BANG PRINCIPLE). Lett > 0 and suppose x° € C(t), 
for the system 
x(t) = Mx(t) + Na(t). 


Then there exists a bang-bang control a(-) which steers x? to 0 at time t. 


To prove the theorem we need some tools from functional analysis, among them the 
Krein—Milman Theorem, expressing the geometric fact that every bounded convex set has 


an extreme point. 


2.5.1 SOME FUNCTIONAL ANALYSIS. We will study the “geometry” of certain 


infinite dimensional spaces of functions. 


NOTATION: 











L™ = L° (0,1;R™) = {a(-) : (0,4) > R™ | sup la(s)| < oo}. 





lall- = sup la(s) 
O<s<t 


DEFINITION. Let a, E€ L” forn=1,... anda E€ L. We say a, converges to @ in 


the weak* sense, written 


* 
Qn — Q, 


provided 


f NOTS l E 














as n — oo, for all v(-) : [0, t] — R”™ satisfying J |v(s)| ds < œ. 


We will need the following useful weak* compactness theorem for L: 


ALAOGLU’S THEOREM. Leta, E€ A, n =1,.... Then there exists a subsequence 
Qn, anda E€ A, such that 


* 


Qn, ` Q. 
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DEFINITIONS. (i) The set K is convevif for all x,ĉ € K and all real numbers 0 < A < 1, 





Az + (1-A)Z EK. 





(ii) A point z € K is called extreme provided there do not exist points xz,ĉ € K and 
0 <A <1 such that 
z=dAr+(1—-A)z. 


KREIN-MILMAN THEOREM. Let K be a convex, nonempty subset of L°°, which is 
compact in the weak x topology. 





Then K has at least one extreme point. 


2.5.2 APPLICATION TO BANG-BANG CONTROLS. 
The foregoing abstract theory will be useful for us in the following setting. We will take 





K to be the set of controls which steer x° to 0 at time t, prove it satisfies the hypotheses 
of Krein—Milman Theorem and finally show that an extreme point is a bang-bang control. 


So consider again the linear dynamics 
(ODE) l x(t) = Mx(t) + Na(t) 


Take x° € C(t) and write 


K = {a(-) € A |a(-) steers x° to 0 at time t}. 


LEMMA 2.9 (GEOMETRY OF SET OF CONTROLS). The collection K of ad- 
missible controls satisfies the hypotheses of the Krein—Milman Theorem. 





Proof. Since z? € C(t), we see that K 4 0. 
Next we show that K is convex. For this, recall that a(-) € K if and only if 








0> ; ~'(5\)Na(s) ds. 
x? = [x (s)Na(s)d 





Now take also & € Kand0<A< 1. Then 
t 
x? = -j X—1(s)NA(s) ds; 
0 


and so 


x’ = -j X~1(s)N(Aa(s) + (1 — å)â(s)) ds 
0 





Hence Aa + (1—A)@ € K. 
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Lastly, we confirm the compactness. Let a, € K for n =1,.... According to Alaoglu’s 
Theorem there exist ny — oo and a € A such that a,, — a. We need to show that 
ack. 

Now @n, € K implies 








=- | Naaa J “X-1(6)Na(s) ds 














by definition of weak-* convergence. Hence a € K. 





We can now apply the Krein—Milman Theorem to deduce that there exists an extreme 





point a* € K. What is interesting is that such an extreme point corresponds to a bang- 
bang control. 


THEOREM 2.10 (EXTREMALITY AND BANG-BANG PRINCIPLE). The 
control a*(-) is bang-bang. 


Proof. 1. We must show that for almost all times 0 < s < t and for each i = 1,...,m, we 
have 
la**(s)| = 1. 


Suppose not. Then there exists an index i € {1,...,m} and a subset Æ C [0,t] of 
positive measure such that |a**(s)| < 1 for s € E. In fact, there exist a number £ > 0 and 
a subset F C E such that 


|F|>Oand ja*(s)|<1l—e forse F. 
Define 
Te(G()) = f KNB) ds 


for 
BC) := (0,..., 8C), 0f, 


the function ( in the i” slot. Choose any real-valued function 8(-) Æ 0, such that 


and |((-)| < 1. Define 
ai (-) — 
Q2(-:):=a 


| 
R, 
| + 
M 
= 


where we redefine 8 to be zero off the set F 
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2. We claim that 





œl), Q2(-) E K. 


To see this, observe that 


-hX (s)Na(s s)ds =- [X s)Na*( s)ds—e fo X N G(s) ds 
= x — i ‘(s)NB(s) ds = x°. 
Tr (8(-))=0 


Note also œ (-) € A. Indeed, 


oo (s F) 
ai(s) =a*(s)+eB(s) (se F). 


But on the set F, we have |a} (s)| < 1 — £, and therefore 


lau (s)| < Je*(s)| + €|8(s)| <1-et+e=1. 





Similar considerations apply for az. Hence a1, @2 € K, as claimed above. 


3. Finally, observe that 


Qi =a*+eB, a, ~a* 
Q2 = Q* — €b, Q #Q*. 
But 
Cead : 
~Q, + =Q2 = QĂŽ; 
9 1 2 2 , 


and this is a contradiction, since a* is an extreme point of K. 

















2.6 REFERENCES. 


See Chapters 2 and 3 of Macki-Strauss [M-S]. An interesting recent article on these 
matters is Terrell [T]. 
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CHAPTER 3: LINEAR TIME-OPTIMAL CONTROL 


3.1 Existence of time-optimal controls 

3.2 The Maximum Principle for linear time-optimal control 
3.3 Examples 

3.4 References 


3.1 EXISTENCE OF TIME-OPTIMAL CONTROLS. 
Consider the linear system of ODE: 


(ODE) { me E T + Na(t) 


for given matrices M € M”*” and N € M”"*™. We will again take A to be the cube 
[—-1, 1)" c R”. 
Define next 


(P) Pla(-)| := -f 1 ds = ~T, 


where T = T(a(-)) denotes the first time the solution of our ODE (3.1) hits the origin 0. 
(If the trajectory never hits 0, we set T = oo.) 


OPTIMAL TIME PROBLEM: We are given the starting point 7° € R”, and want to 
find an optimal control a*(-) such that 
Pla®(-)| = Pla(-)]. 
O] = max, Pla) 


Then 


T* = —Pla*(-)] is the minimum time to steer to the origin. 


THEOREM 3.1 (EXISTENCE OF TIME OPTIMAL CONTROL). Let x° € R”. 
Then there exists an optimal bang-bang control a*(-). 


Proof. Let T* := inf{t | x° € C(t)}. We want to show that z? € C(r*); that is, there exists 
an optimal control a*(-) steering x° to 0 at time 7*. 

Choose tı > t2 > t3 >... so that x? € C(t,) and tn — T*. Since z? € C(t,), there 
exists a control @,,(-) € A such that 


0 _ ý ~1(s)Nan (s) ds. 
a / X (s)Nanls)d 


If necessary, redefine a@,,(s) to be 0 for tn < s. By Alaoglu’s Theorem, there exists a 
subsequence nx — oo and a control a*(-) so that 


* 
Qn — a’. 
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We assert that a@*(-) is an optimal control. It is easy to check that a*(s) = 0, s > T*. 
Also 


=- [" X` (s)N@n,(s =- f" X ` (s)Nan,(s)ds, 


since Qn, = 0 for s > tnp. Let nk — o0: 


=- [Px IN (ade i= [ X-1(s)Net(s) ds 


because a*(s) = 0 for s > r*. Hence x° € C(r*), and therefore a*(-) is optimal. 











According to Theorem 2.10 there in fact exists an opimal bang-bang control. 





3.2 THE MAXIMUM PRINCIPLE FOR LINEAR TIME OPTIMAL CON- 
TROL 


The really interesting practical issue now is understanding how to compute an optimal 
control a*(-). 


DEFINITION. We define K(t,x°) to be the reachable set for time t. That is, 


K(t,x°) = {xt | there exists a(-) € A which steers from z’ to x! at time t}. 
Since x(-) solves (ODE), we have x! € K(t, x?) if and only if 


= X(t)x° + X(t) / X~'(s)Na(s) ds = x(t) 


for some control a(-) € A. 


THEOREM 3.2 (GEOMETRY OF THE SET K). The set K(t,x°) is conver and 
closed. 


Proof. 1. (Convexity) Let xt, £? € K(t,x°). Then there exists @1,a2 € A such that 
t)z? +X(t @ [x X  (s)Naı(s)ds 
t)z? + X(t @ [x X  (s)Na(s)ds. 

Let 0< A <1. Then 


An! + (1 —A)2? = X(t)? + X(t a AE RECT OTe 


and hence Ax! + (1 — A)ax? € K(t, x°). 


2. (Closedness) Assume zë € K(t, x?) for (k = 1,2,...) and zë — y. We must show 
y € K(t,x°). As zë € K(t,x°), there exists a;,(-) € A such that 


rë =X (t)2° + X(t) J X-1(s) Nex,(s) ds. 


According to Alaoglu’s Theorem, there exist a subsequence k; — oo and œ € A such 
that apa. Let k = kj — oo in the expression above, to find 


y = X(t)x° + X(t) J X~1(s)Na(s) ds. 














Thus y € K(t,x°), and hence K (t,x) is closed. 
NOTATION. If S is a set, we write 0S to denote the boundary of S. 


Recall that 7* denotes the minimum time it takes to steer to 0, using the optimal control 
a*. Note that then 0 € 0K(r*, 2°). 


THEOREM 3.3 (PONTRYAGIN MAXIMUM PRINCIPLE FOR LINEAR 
TIME-OPTIMAL CONTROL). There exists a nonzero vector h such that 
(M) h?X71(t)Na*(t) = max{h! X! (t)Na} 


for each time 0 < t < 7*. 


INTERPRETATION. The significance of this assertion is that if we know h then the 
maximization principle (M) provides us with a formula for computing a*(-), or at least 
extracting useful information. 

We will see in the next chapter that assertion (M) is a special case of the general 


Pontryagin Maximum Principle. 


Proof. 1. We know 0 € 0K(r*,2°). Since K(r*, x?) is convex, there exists a supporting 
plane to K(r*,x°) at 0; this means that for some g 4 0, we have 


g-%, <0 forall zı E€ K(T*, x°). 
2. Now xt € K(T*, x?) if and only if there exists a(-) € A such that 
x! = X(r*)r° + x) f X~!(s)Na(s) ds. 
0 


Also * 
0 = X(r*)x° + x’) f X~1(s)Na*(s) ds. 
0 
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Since g- xt < 0, we deduce that 


g? (2 + X(r*) i. X~'(s)Na(s) i) 


20g" (xene + x) f X~'(s)Na*(s) ds) ‘ 
0 
Define hT := g?X(r*). Then 


F h?X~1(s)Na(s) ds < ie h?X~1(s)Na*(s) ds; 
0 0 


and therefore 


f Fx ONE Gadi 
0 
for all controls a(-) € A. 


3. We claim now that the foregoing implies 
h?X~1(s)Na*(s) = max{h'X~"(s)Na} 
ae 


for almost every time s. 
For suppose not; then there would exist a subset Æ C [0,7*] of positive measure, such 
that 
h” X! (s)Na* (s) < max{h" X '(s)Na} 


for s € E. Design a new control â(-) as follows: 


TE 


where a(s) is selected so that 
max{h" X! (s)Na} = h" X7! (s)Na(s). 
ae 


Then 
[wx ONE) Sade = 0 
<0 











This contradicts Step 2 above. 





For later reference, we pause here to rewrite the foregoing into different notation; this 
will turn out to be a special case of the general theory developed later in Chapter 4. First 
of all, define the Hamiltonian 


H(x,p,a) = (Mz + Na) +p (x,p E€ R”,a € A). 
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THEOREM 3.4 (ANOTHER WAY TO WRITE PONTRYAGIN MAXIMUM 
PRINCIPLE FOR TIME-OPTIMAL CONTROL). Let a*(-) be a time optimal 
control and x*(-) the corresponding response. 

Then there exists a function p*(-) : [(0,7*] — R”, such that 


(ODE) x" (t) = VH (x* (t), p“ (t), a" (t)), 
(ADJ) p(t) = -V:H (x* (t), p* (t), a“ (t)), 

and 

(M) H(x* (t), p* (t), a” (t)) = max H (x* (t), p* (t), a). 


We call (ADJ) the adjoint equations and (M) the mazimization principle. The function 
p*(-) is the costate. 


Proof. 1. Select the vector h as in Theorem 3.3, and consider the system 


{ p*(t) = —MTp*(t) 
p*(0) =A. 


The solution is p*(t) = e™tM" h; and hence 
p*(t)” =hR"X (t), 


since (e~'@")T = eM = X7! (t). 


2. We know from condition (M) in Theorem 3.3 that 
h?X~1(t)Na*(t) = max{h" X (t)Na} 
ac 
Since p*(t)? = h? X~1(t), this means that 
p*(t)"(Mx* (t) + Na*(t)) = max{p*(t)" (Mx* (t) + Na)}. 
3. Finally, we observe that according to the definition of the Hamiltonian H, the 


dynamical equations for x*(-),p*(-) take the form (ODE) and (ADJ), as stated in the 
Theorem. 














3.3 EXAMPLES 
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EXAMPLE 1: ROCKET RAILROAD CAR. We recall this example, introduced in 
$1.2. We have 


(ODE) x(t) = (co) a (i) a(t) 
for 


x(t) = ey aie 


According to the Pontryagin Maximum Principle, there exists h Æ 0 such that 


(M) h?X~1(t)Na*(t) = max{h? X~1(t)Na}. 


Ja|<1 


We will extract the interesting fact that an optimal control a* switches at most one time. 
We must compute e’”. To do so, we observe 


0 eff OF BO OY OES icy 
mo=1M=(9 b) M=(5 0 0) A 


and therefore M* = 0 for all k > 2. Consequently, 


Be eo a= E i 


0 1 
ann. 


Swa R) 


0 
1 
hTX-1(t)N = (hy, ho) a) 


Then 


= —th, + hə. 


The Maximum Principle asserts 


(—thi + ha)a*(t) = hoe aad + h2)a}; 


and this implies that 
a” (t) = sgn(—th, + h2) 


for the sign function 


1 x>0 
senx = 4 0 xr=0 
—] «<0. 
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Therefore the optimal control a* switches at most once; and if hı = 0, then a* is constant. 
Since the optimal control switches at most once, then the control we constructed by a 














geometric method in §1.3 must have been optimal. 


EXAMPLE 2: CONTROL OF A VIBRATING SPRING. Consider next the simple 
dynamics 

L+x=a, 
where we interpret the control as an exterior force acting on an oscillating weight (of unit 
mass) hanging from a spring. Our goal is to design an optimal exterior forcing a*(-) that 


brings the motion to a stop in minimum time. 


MLL Li 














spring 
© mass 


We have n = 2, m = 1. The individual dynamical equations read: 


l tt(t) = xz? (t) 


t? (t) = —x! (t) + a(t); 


which in vector notation become 


(ODE) x(t) = (" oO n (Yaw 


for |a(t)| < 1. That is, A = [—1, 1]. 


Using the maximum principle. We employ the Pontryagin Maximum Principle, 
which asserts that there exists h Æ 0 such that 


(M) h?X~1(t)Na*(t) = max{h!X~" (t)Na}. 


To extract useful information from (M) we must compute X(-). To do so, we observe 
that the matrix M is skew symmetric, and thus 


i fü 4 a (ok GO... 
wanie g 4), m= . hos! 


Therefore 
M*F=I if k = 0,4,8,.. 
M!=M ifk=1,5,9,. 
M*!=-I ifk=2,6,... 
M'=-M ifk=3,7 


and consequently 


t 
eM =I +tM + -M° +... 





2! 
t2 t8 t4 
=I+tM— a” OE FS 
t t4 t8 t5 
= cos tI + sin tM = ( Ose a y 
—sint cost 
So we have 
-ız _ [cost —sint 
= ee cos t ) 
and 
4 _ fcost —sint 0 a /-snt\. 
NS e cost ) G a cost J 
whence 


According to condition (M), for each time t we have 


(—hi sin t + hə cos t)a* (t) = ax{(—ħ sin t + hə cos t)a}. 


jal< 
Therefore 


a“ (t) = sgn(—h, sin t + h2 cost). 


Finding the optimal control. To simplify further, we may assume h?+h2 = 1. Recall 
the trig identity sin(x + y) = sin z cos y + cos x sin y, and choose ô such that —h, = cos ô, 
hə = sin ô. Then 


a* (t) = sgn(cos ô sin t + sin ô cos t) = sgn(sin(t + ô)). 


We deduce therefore that a* switches from +1 to —1, and vice versa, every m units of 
time. 


Geometric interpretation. Next, we figure out the geometric consequences. 
When a = 1, our (ODE) becomes 








In this case, we can calculate that 


d 


TO — 1)? + (27)? @) = Ji (E) + 20° (t)i? (t) 


z’ (t (—xt (t) +1) = 0. 


t) 
t) 


Consequently, the motion satisfies (a(t) — 1)? + (x?)? (t) = r?, for some radius rı, and 


Lia r 
— 1)2?(t) +2 


therefore the trajectory lies on a circle with center (1,0), as illustrated. 
If a = —1, then (ODE) instead becomes 


gl = r? 
er ae 


d 2 WI) 
a le ) + I)" + (2°)"(@)) = 0. 


Thus (a(t) + 1)? + (x?)?(t) = r2 for some radius r2, and the motion lies on a circle with 
center (—1, 0). 


in which case 








In summary, to get to the origin we must switch our control a(-) back and forth between 
the values +1, causing the trajectory to switch between lying on circles centered at (+1, 0). 


The switches occur each m units of time. 
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CHAPTER 4: THE PONTRYAGIN MAXIMUM PRINCIPLE 


4.1 Calculus of variations, Hamiltonian dynamics 

4.2 Review of Lagrange multipliers 

4.3 Statement of Pontryagin Maximum Principle 

4.4 Applications and examples 

4.5 Maximum Principle with transversality conditions 
4.6 More applications 

4.7 Maximum Principle with state constraints 

4.8 More applications 

4.9 References 


This important chapter moves us beyond the linear dynamics assumed in Chapters 2 and 
3, to consider much wider classes of optimal control problems, to introduce the fundamental 
Pontryagin Maximum Principle, and to illustrate its uses in a variety of examples. 


4.1 CALCULUS OF VARIATIONS, HAMILTONIAN DYNAMICS 
We begin in this section with a quick introduction to some variational methods. These 
ideas will later serve as motivation for the Pontryagin Maximum Principle. 














Assume we are given a smooth function L : R” x R” > R, L = L(z,v); L is called the 














Lagrangian. Let T > 0, xz?, x! € R” be given. 





BASIC PROBLEM OF THE CALCULUS OF VARIATIONS. Find a curve 
x*(-) : [0,7] — R” that minimizes the functional 














T 
(4.1) I[x(-)] := f L(x(t),x(t)) dt 


among all functions x(-) satisfying x(0) = x° and x(T) = zt. 


Now assume x*(-) solves our variational problem. The fundamental question is this: 


how can we characterize x*(-)? 
4.1.1 DERIVATION OF EULER-LAGRANGE EQUATIONS. 


NOTATION. We write L = L(z,v), and regard the variable x as denoting position, the 
variable v as denoting velocity. The partial derivatives of L are 





OL OL 
= Lrt a > Lig. 1 < ) $ ; 
and we write 


Tobil ere a, vea a lige) 
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THEOREM 4.1 (EULER-LAGRANGE EQUATION). Let x*(-) solve the calculus 
of variations problem. Then x*(-) solves the Euler-Lagrange differential equations: 


d 


(E-L) ET 


[Vo L(x* (t), x" (t))] = Va L(x* (t), x* (t)). 


The significance of preceding theorem is that if we can solve the Euler-Lagrange equa- 
tions (E-L), then the solution of our original calculus of variations problem (assuming it 
exists) will be among the solutions. 

Note that (E-L) is a quasilinear system of n second-order ODE. The it” component of 


the system reads 


d 
dt 
Proof. 1. Select any smooth curve y[0, T] — R”, satisfying y(0) = y(T) = 0. Define 


[Lu;(x"(t), x"(t))] = Lo, (x"(t), x* 6). 


i(t) := I(E) + ry()] 





for r € R and x(-) = x*(-). (To simplify we omit the superscript «.) Notice that x(-)+7Ty(-) 
takes on the proper values at the endpoints. Hence, since x(-) is minimizer, we have 


i(r) > I[x()] = i(0). 
Consequently i(-) has a minimum at 7 = 0, and so 


i'(0) = 0. 
2. We must compute 7’(7). Note first that 


T 
i(r) = i L(x(t) + Ty (t), x(t) + T9 (t) dt; 


and hence 


=f (So a t) + 7y(t), x(t) + Ty(t 0+ ut yil 0) dt. 


Let r = 0. Then 


n pT 
oe f Lz,(x(t), X(t))yi(t) + Lo: (x(t), X(t) y(t) dt. 














This equality holds for all choices of y : [0, T] — R”, with y(0) = y(T) =0. 
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3. Fix any 1 < j < n. Choose y(-) so that 
yi(t)=0 Æj, y(t) =v), 
where w is an arbitary function. Use this choice of y(-) above: 
T . 
o= f La (KO EEUE + Lo, (x(t) X(t) HOO dt 
0 


Integrate by parts, recalling that ~(0) = ~(T) = 0: 


= f d . 
0= | [te (x(t) (0) ~ S (Lo, (0) X10) w(t) dt. 














This holds for all 4 : [0, T] — R, y(0) = 4(T) = 0 and therefore 


Le, (x(t),X(6)) — É (Loy x(t), X(H))) =0 


for all times 0 < t < T. To see this, observe that otherwise Le, — 4 (Ly;) would be, say, 


positive on some subinterval on J C [0, T]. Choose w = 0 off J, y > 0 on J. Then 


À d 
/ (1. ne (Le) ) dt > 0, 











a contradiction. 





4.1.2 CONVERSION TO HAMILTON’S EQUATIONS. 
DEFINITION. For the given curve x(-), define 
p(t) := V,L(x(t), x(#)) (O0<t<T). 


We call p(-) the generalized momentum. 


Our intention now is to rewrite the Euler-Lagrange equations as a system of first-order 
ODE for x(-), p(-). 





IMPORTANT HYPOTHESIS: Assume that for all x,p € R”, we can solve the equa- 


tion 











(4.2) p=VvL(z,0v) 


for v in terms of x and p. That is, we suppose we can solve the identity (4.2) for v = v(x, p). 
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DEFINITION. Define the dynamical systems Hamiltonian H : R” x R” — R by the 


formula 





H (x, p) =p: v(x, p) = L(x, v(x, p)), 
where v is defined above. 


NOTATION. The partial derivatives of H are 


OH OH 
= Ay. = FH, 1<i<n), 


and we write 


Vill :=(He,,..-,He,),  VpH := (Hp,,---,Hp,). 


THEOREM 4.2 (HAMILTONIAN DYNAMICS). Letx(-) solve the Euler-Lagrange 
equations (E-L) and define p(-)as above. Then the pair (x(-), p(-)) solves Hamilton’s equa- 
tions: 


(H) { x(t) = VH (x(t), p(t)) 


Furthermore, the mapping t > H(x(t), p(t)) is constant. 
Proof. Recall that H(x,p) = p- v(x, p) — L(x, v(x, p)), where v = v(x, p) or, equivalently, 
p = Va L(x, v). Then 


VH (2, p) =p: Vav — VrLlx, v(x, p))— VuL(2, v(x, p)) -Vrv 
= —V,L(a, v(x, p)) 


because p = V, L. Now p(t) = V,L(x(t), x(t)) if and only if x(t) = v(x(t), p(t)). Therefore 
(E-L) implies 


Also 
VpH(x,p) = v(x, p) +p: Vov — VoL: Vpv = v(x, p) 


since p = V L(x, v(x, p)). This implies 
VpH (x(t), p(t)) = v(x(t), p(t). 
But 


p(t) = VoL (x(t), x(t) 
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and so x(t) = v(x(t), p(t)). Therefore 
x(t) = V H (x(t), p(t). 


Finally note that 


d 


get (x(t), pO) = V:H -x(t) + VH p(t) = V:H -VpH +VpH (V:H) = 0. 














A PHYSICAL EXAMPLE. We define the Lagrangian 
2 
Law) = qe 





V(x), 
which we interpret as the kinetic energy minus the potential energy V. Then 
VL = -VV (x), VaL = mw. 
Therefore the Euler-Lagrange equation is 
mx(t) = -VV (x(t)), 
which is Newton’s law. Furthermore 
p = Va L(x, v) = mw 


is the momentum, and the Hamiltonian is 





p p p? mip}? Ip}? 
H ; = . — — L( ; ) => — | | = ——— 3 
R m x m m 2 In PVE) 2m ey) 


the sum of the kinetic and potential energies. For this example, Hamilton’s equations read 














4.2 REVIEW OF LAGRANGE MULTIPLIERS. 


CONSTRAINTS AND LAGRANGE MULTIPLIERS. What first strikes us 
about general optimal control problems is the occurence of many constraints, most no- 
tably that the dynamics be governed by the differential equation 


(ODE) l n = : (x(t), a(¢)) (t > 0) 


45 


This is in contrast to standard calculus of variations problems, as discussed in $4.1, where 


we could take any curve x(-) as a candidate for a minimizer. 


Now it is a general principle of variational and optimization theory that “constraints 
create Lagrange multipliers” and furthermore that these Lagrange multipliers often “con- 
tain valuable information”. This section provides a quick review of the standard method 
of Lagrange multipliers in solving multivariable constrained optimization problems. 


UNCONSTRAINED OPTIMIZATION. Suppose first that we wish to find a max- 
imum point for a given smooth function f : R” — R. In this case there is no constraint, 

















and therefore if f(x*) = maxzern f(x), then 2* is a critical point of f: 
Vite =U. 


CONSTRAINED OPTIMIZATION. We modify the problem above by introducing 
the region 
R:= {x € R” | g(x) < 0}, 





determined by some given function g : R” — R. Suppose «* € Rand f(xz*) = maxzer f(z). 
We would like a characterization of x* in terms of the gradients of f and g. 


Case 1: x* lies in the interior of R. Then the constraint is inactive, and so 


(4.3) Vf(x*) =0. 


gradient of f 


FIGURE 1 


Case 2: x* lies on OR. We look at the direction of the vector Vf(x*). A geometric 
picture like Figure 1 is impossible; for if it were so, then f(y*) would be greater that f(2*) 
for some other point y* € OR. So it must be Vf(a*) is perpendicular to OR at x*, as 


shown in Figure 2. 
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gradient of f 


gradient of g 


FIGURE 2 


Since Vg is perpendicular to OR = {g = 0}, it follows that V f(zx*) is parallel to Vg(a*). 
Therefore 


(4.4) Vf(2*) =AVg(2") 
for some real number åA, called a Lagrange multiplier. 


CRITIQUE. The foregoing argument is in fact incomplete, since we implicitly assumed 
that Vg(«*) 4 0, in which case the Implicit Function Theorem implies that the set {g = 0} 
is an (n — 1)-dimensional surface near x* (as illustrated). 

If instead Vg(a*) = 0, the set {g = 0} need not have this simple form near x*; and the 
reasoning discussed as Case 2 above is not complete. 

The correct statement is this: 


(4.5) l There exist real numbers À and u, not both equal to 0, such that 


HV f (2*) = AVg(2*). 


If u #0, we can divide by u and convert to the formulation (4.4). And if Vg(x*) = 0, we 











can take à = 1, u = 0, making assertion (4.5) correct (if not particularly useful). 





4.3 STATEMENT OF PONTRYAGIN MAXIMUM PRINCIPLE 


We come now to the key assertion of this chapter, the theoretically interesting and prac- 
tically useful theorem that if @*(-) is an optimal control, then there exists a function p*(-), 
called the costate, that satisfies a certain maximization principle. We should think of the 
function p*(-) as a sort of Lagrange multiplier, which appears owing to the constraint that 
the optimal curve x*(-) must satisfy (ODE). And just as conventional Lagrange multipliers 
are useful for actual calculations, so also will be the costate. 


4.3.1 FIXED TIME, FREE ENDPOINT PROBLEM. Let us review the basic set- 


up for our control problem. 
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We are given A C R” and also f : R” x A > R”, x? € R”. We as before denote the set 
of admissible controls by 





A = {a(-) : [0, œ) — A | a(-) is measurable}. 
Then given a(-) € A, we solve for the corresponding evolution of our system: 


{ x(t) =f(x(t),a(t)) (t 290) 
x(0) = x° 


g 
We also introduce the payoff functional 


(ODE) 


T 
(P) Plat) = f (x(t), o(t)) dt + o(x(T)), 











where the terminal time T > 0, running payoff r : R” x A — R and terminal payoff 








g : R” — R are given. 


BASIC PROBLEM: Find a control a*(-) such that 


Pla()] = max, Pla] 


The Pontryagin Maximum Principle, stated below, asserts the existence of a function 
p*(-), which together with the optimal trajectory x*(-) satisfies an analog of Hamilton’s 
ODE from 84.1. For this, we will need an appropriate Hamiltonian: 


DEFINITION. The control theory Hamiltonian is the function 
H(a,p,a) :=f(2,a)-p+r(a,a) (x,p E€ R”,a € A). 
THEOREM 4.3 (PONTRYAGIN MAXIMUM PRINCIPLE). Assume a*(-) is 


optimal for (ODE), (P) and x*(-) is the corresponding trajectory. 
Then there exists a function p* : [0,T] — R” such that 


(ODE) x" (t) = VH (x* (t), p“ (t), a" (t)), 

(ADJ) D“ (t) = -V:H (x* (t), p* (t), a"), 

and 

(M) H(x* (t), p* (t), a” (t)) = max H(x* (t), p*(t),a) (St sr"). 
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In addtion, 
the mapping t= H(x*(t),p*(t),a*(t)) is constant. 


Finally, we have the terminal condition 


(T) p*(T) = Vg(x"(T)). 


REMARKS AND INTERPRETATIONS. (i) The identities (ADJ) are the adjoint 
equations and (M) the maximization principle. Notice that (ODE) and (ADJ) resemble 
the structure of Hamilton’s equations, discussed in §4.1. 

We also call (T) the transversality condition and will discuss its significance later. 

(ii) More precisely, formula (ODE) says that for 1 < i < n, we have 


é*(t) = Hp,(x*(t), p*(t), a*(t)) = f'(x*(), a* (6), 


which is just the original equation of motion. Likewise, (ADJ) says 


p(t) = —Hz,(x*(t), p* (t), a* (t) 














4.3.2 FREE TIME, FIXED ENDPOINT PROBLEM. Let us next record the ap- 
propriate form of the Maximum Principle for a fixed endpoint problem. 
As before, given a control a(-) € A, we solve for the corresponding evolution of our 
system: 
E PA OO) 620 
x(0) = x° 


Ps 
Assume now that a target point x! € R” is given. We introduce then the payoff 














functional 


(P) Pla) = | reaa 





Here r : R” x A — R is the given running payoff, and T = T|a(-)| < oo denotes the first 
time the solution of (ODE) hits the target point xt. 


As before, the basic problem is to find an optimal control a*(-) such that 


Pla(.)| = max, Pla] 


Define the Hamilton H as in 84.3.1. 
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THEOREM 4.4 (PONTRYAGIN MAXIMUM PRINCIPLE). Assume a*(-) is 
optimal for (ODE), (P) and x*(-) is the corresponding trajectory. 





Then there exists a function p* : [0,7*] > R” such that 


(ODE) x" (t) = VoH (x* (t), p* (t), a* (t)), 

(ADJ) p“ (t) = -VH (x* (t), p* (t), œ* (t)), 

and 

(M) H(x* (t), p* (t), a* (t)) = max H(x* (t), p*(6),a)  (0<t<7*). 
Also, 


H(x*(t),p*(t),a*(t))=0 (0<t<r*). 


Here r* denotes the first first time the trajectory x*(-) hits the target point xt. We call 
x*(-) the state of the optimally controlled system and p*(-) the costate. 


REMARK AND WARNING. More precisely, we should define 
H(x,p,q,a) =f(x,a)-p+r(x,a)q (q ER). 


A more careful statement of the Maximum Principle says “there exists a constant q > 0 
and a function p* : [0,t*] — R” such that (ODE), (ADJ), and (M) hold”. 

If q > 0, we can renormalize to get q = 1, as we have done above. If q = 0, then H does 
not depend on running payoff r and in this case the Pontryagin Maximum Principle is not 
useful. This is a so-called “abnormal problem”. 

Compare these comments with the critique of the usual Lagrange multiplier method at 
the end of 84.2, and see also the proof in 8A.5 of the Appendix. 














4.4 APPLICATIONS AND EXAMPLES 


HOW TO USE THE MAXIMUM PRINCIPLE. We mentioned earlier that the 
costate p*(-) can be interpreted as a sort of Lagrange multiplier. 


Calculations with Lagrange multipliers. Recall our discussion in §4.2 about finding 
a point «* that maximizes a function f, subject to the requirement that g < 0. Now 


Sees 


,c*)* has n unknown components we must find. Somewhat unexpectedly, it 


turns out in practice to be easier to solve (4.4) for the n+1 unknowns zj,...,7% and A. We 
repeat this key insight: it is actually easier to solve the problem if we add a new unknown, 


namely the Lagrange multiplier. Worked examples abound in multivariable calculus books. 
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Calculations with the costate. This same principle is valid for our much more 
complicated control theory problems: it is usually best not just to look for an optimal 
control a*(-) and an optimal trajectory x*(-) alone, but also to look as well for the costate 
p*(-). In practice, we add the equations (ADJ) and (M) to (ODE) and then try to solve 
for a*(-),x*(-) and for p*(-). 

The following examples show how this works in practice, in certain cases for which 
we can actually solve everything explicitly or, failing that, at least deduce some useful 
information. 


4.4.1 EXAMPLE 1: LINEAR TIME-OPTIMAL CONTROL. For this example, 
let A denote the cube [—1, 1] in R”. We consider again the linear dynamics: 
x(t) = Mx(t) + Na(t 
ie ee 
x(0) = x”, 
for the payoff functional 


(P) Pla())=- | 1# =- 


where 7 denotes the first time the trajectory hits the target point xt? = 0. We have r = —1, 
and so 
H(a,p,a)=f-p+r=(Mz+Na)-p-1. 


In Chapter 3 we introduced the Hamiltonian H = (Max + Na) - p, which differs by a 
constant from the present H. We can redefine H in Chapter III to match the present 














theory: compare then Theorems 3.4 and 4.3. 


4.4.2 EXAMPLE 2: CONTROL OF PRODUTION AND CONSUMPTION. 
We return to Example 1 in Chapter 1, a model for optimal consumption in a simple 
economy. Recall that 


x(t) = output of economy at time t, 
a(t) = fraction of output reinvested at time t. 


We have the constraint 0 < a(t) < 1; that is, A = [0,1] C R. The economy evolves 
according to the dynamics 

x(t) = a(t)a(t 0<t<T 
sibs ee eee 

oO) =x 


where x° > 0 and we have set the growth factor k = 1. We want to maximize the total 


consumption 


T 
(P) Plawli= f (1-a dt 
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How can we characterize an optimal control a*(-)? 


Introducing the maximum principle. We apply Pontryagin Maximum Principle, 
and to simplify notation we will not write the superscripts * for the optimal control, 
trajectory, etc. We have n = m= 1, 


f(z,a) = za, g =0, r(z,a) = (1 — a)z; 
and therefore 
H(a,p,a) = f(x,a)p + r(x,a) = pra + (1 — ajx = x + az(p — 1). 

The dynamical equation is 
(ODE) i(t) = Hp = a(t)a(t), 
and the adjoint equation is 
(ADJ) p(t) = —Hz = —1— alt)(p(t) — 1). 
The terminal condition reads 
(T) p(T) = g9s(2(T)) = 0. 

Lastly, the maximality principle asserts 


(M) AH (a(t), p(t), a(t)) = max {x(t) + ax(t)(p(t) — 1)} 


0<a<1 


Using the maximum principle. We now deduce useful information from (ODE), 
(ADJ), (M) and (T). 

According to (M), at each time t the control value a(t) must be selected to maximize 
a(p(t) — 1) for 0 <a < 1. This is so, since x(t) > 0. Thus 


a= if p(t)>1 


0 if p(t) <1. 
Hence if we know p(-), we can design the optimal control a(-). 
So next we must solve for the costate p(-). We know from (ADJ) and (T) that 
{ p(t) =-—1-a(lt)lpt)-1] (<t<T) 


p(T) =0 
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Since p(T) = 0, we deduce by continuity that p(t) < 1 for t close to T, t < T. Thus 
a(t) = 0 for such values of t. Therefore p(t) = —1, and consequently p(t) = T — t for times 
t in this interval. So we have that p(t) = T — t so long as p(t) < 1. And this holds for 
T-1<t<T 

But for times t < T — 1, with t near T — 1, we have a(t) = 1; and so (ADJ) becomes 


p(t) = —1— (p(t) — 1) = =p): 


Since p(T — 1) = 1, we see that p(t) = e?~!~* > 1 for all times 0 < t < T—1. In particular 


there are are no switches in the control over this time interval. 


Restoring the superscript * to our notation, we consequently deduce that an optimal 


control is 
1 if O0O<t<t 
a* (t) = 


0 if #<t<T 
for the optimal switching time t* = T — 1. 
We leave it as an exercise to compute the switching time if the growth constant k Æ 1. 














4.4.3 EXAMPLE 3: A SIMPLE LINEAR-QUADRATIC REGULATOR. We 
take n = m = 1 for this example, and consider the simple linear dynamics 


l i(t) = z(t) + a(t) 


(ODE) ee 


with the quadratic cost functional 


T 
f x(t)? + a(t)? dt, 
0 


which we want to minimize. So we want to maximize the payoff functional 


T 
(P) Pla(-)] = =i a(t)? + a(t)? dt. 





For this problem, the values of the controls are not constrained; that is, A = R. 


Introducing the maximum principle. To simplify notation further we again drop 


the superscripts *. We have n = m = 1, 


f(z, a) =x%+a, g =0, r(x,a) z -g —a’; 
and hence 
H(z,p,a) = fp+r = (x + a)p — (x° + a°) 
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The maximality condition becomes 


(M) A (a(t), p(t), a(t)) = max{—(«*(t) +a’) + p(t)(2(t) + a)} 


We calculate the maximum on the right hand side by setting Ha = —2a + p = 0. Thus 


a = 5, and so 
a(t) = so 
The dyamical equations are therefore 
(ODE) ift) = x(t) + me) 
and 
(ADJ) p(t) = -Ha = 20(t) — ptt). 


Moreover x(0) = x°, and the terminal condition is 


(T) p(T) = 0. 


Using the Maximum Principle. So we must look at the system of equations 
é\ (1 1/2\(« 
Py ND, EEND 
eS 


the general solution of which is 


Since we know x°, the task is to choose p° so that p(T) = 0. 


Feedback controls. An elegant way to do so is to try to find optimal control in linear 
feedback form; that is, to look for a function c(-) : [0,7] — R for which 


a(t) = c(t) z(t). 


We henceforth suppose that an optimal feedback control of this form exists, and attempt 
to calulate c(-). Now 
= a(t) = c(t)z(t); 
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p(t) 
2 


_ p(t) 


whence c(t) = may Define now 


so that c(t) = ao) 


We will next discover a differential equation that d(-) satisfies. Compute 


) £ 
qarat 
£ T 
and recall that 
t= 
p= 2r- p 


Therefore 





. 2xr-p p P\ _ d\ d? 
i= Z -Z (e+) =2-a-a(14$) =2-24 E. 


Since p(T) = 0, the terminal condition is d(T) = 0. 
So we have obtained a nonlinear first-order ODE in p(-) with a terminal boundary 
condition: 
d=2-2d—i¢? 0<t<T 
(R) l 2 ( — ) 
d(T) =0. 
This is called the Riccati equation. 
In summary so far, to solve our linear-quadratic regulator problem, we need to first 
solve the Riccati equation (R) and then set 


How to solve the Riccati equation. It turns out that we can convert (R) it into a 
second-order, linear ODE. To accomplish this, write 


a) = Fy 


for a function b(-) to be found. What equation does b(-) solve? We compute 


2b( 
i 





Hence (R) gives 


and consequently 7 f 
a (0<t<T) 
bT) = 0, bT) = 
This is a terminal-value problem for second-order linear ODE, which we can solve by 
standard techniques. We then set d = 2 to derive the solution of the Riccati equation 
(R). 
We will generalize this example later to systems, in §5.2. 














4.4.4 EXAMPLE 4: MOON LANDER. This is a much more elaborate and inter- 
esting example, already introduced in Chapter 1. We follow the discussion of Fleming and 
Rishel [F-R]. 

Introduce the notation 

h(t) = height at time t 
v(t) = velocity = h(t) 
m(t) = mass of spacecraft (changing as fuel is used up) 
a(t) = thrust at time t. 
The thrust is constrained so that 0 < a(t) < 1; that is, A = [0,1]. There are also the 
constraints that the height and mass be nonnegative: h(t) > 0,m/(t) > 0. 
The dynamics are 


h(t) = v(t) 
(ODE) olt) = -g + 29 
m(t) = —ka(t), 
with initial conditions 
h(0) = ho > 0 
v(0) = vo 


The goal is to land on the moon safely, maximizing the remaining fuel m(r), where 
T = T[a(-)] is the first time A(T) = v(T) = 0. Since a = -%, 
to minimize the total applied thrust before landing; so that 


(P) Pla()]=— f atat 


This is so since 


our intention is equivalently 


7 _ Mog — m(T) 
i a(t) dt = at Sey 


Introducing the maximum principle. In terms of the general notation, we have 


h(t) v 
x(t)= | vt) |, £= | -g +a/m 
m(t) —ka 
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Hence the Hamiltonian is 
H (x, p, a) = f-p+r 
= (v,—g + a/m, —ka) - (p1, p2, p3) — 
a 
=—a+ piv + p2 (-g + 2) + p3(—ka). 


We next have to figure out the adjoint dynamics (ADJ). For our particular Hamiltonian, 


He, = Hn =0, He, = Hy = p, Hog = Hm = = 

Therefore 

pr(t) =0 
(ADJ) p(t) = —p' (t) 

2 

p(t) = nini 

The maximization condition (M) reads 
A(x(t), p(t), a(t) = max H(t), p(t), a) 
= _ 1 2 _ a Sar 

a = pas, -atp (OO +20 f-o 2] +O ka} 





2 
= p'(t)v(t) — p (t)g + max fa (-1 ee Ce v() \ : 
0<a<1 m 
Thus the optimal control law is given by the rule: 


1 if 1-24 + kp(t) <0 
a(t) = a 


0 if 1— 212 + kp3(t) > 0. 


Using the maximum principle. Now we will attempt to figure out the form of the 
solution, and check it accords with the Maximum Principle. 

Let us start by guessing that we first leave rocket engine of (i.e., set a = 0) and turn 
the engine on only at the end. Denote by 7 the first time that A(T) = v(7) = 0, meaning 
that we have landed. We guess that there exists a switching time t* < 7 when we turned 
engines on at full power (i.e., set a = 1).Consequently, 


0 for 0<t<t 
a(t) = 
1 for t*<t<vr. 


Therefore, for times t* < t < 7 our ODE becomes 


h(t) = v(t) 
v(t) = -g + ml) (C57) 
m(t) =—k 
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with A(T) = 0, v(t) =0, m(t*) = mo. We solve these dynamics: 


m(t) = mo + k(t* — t) 





v(t) = g(r- t) + Flog [Zeti] 
h(t) = complicated formula. 


Now put t = t*: 
m(t*) = mo 


u(t") = g(r — t*) + 4 log [reen] 


0 





* 


h(t*) = = = mg log | oe] + t T, 


0 





Suppose the total amount of fuel to start with was mj ; so that mo — mı is the weight 
of the empty spacecraft. When a = 1, the fuel is used up at rate k. Hence 


k(r — t*) < mı, 
and so 0 < T — t* < ™, 


k 


h axis 


powered descent trajectory (œ = 1) 








v axis 


Before time t*, we set œa = 0. Then (ODE) reads 


h=v 
ù= =g 
m = 0; 
and thus 
m(t) = mo 
v(t) = —gt + vo 
h(t) = —4gt? + tuo + ho. 
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We combine the formulas for v(t) and h(t), to discover 
h(t) = ho -—(v*(t)-—ve)  (0<t<t*). 
We deduce that the freefall trajectory (v(t), h(t)) therefore lies on a parabola 


1 
h = ho — T — vg). 


h axis 


freefall trajectory (œ = o) 


powered trajectory (a = 1) 








v axis 


If we then move along this parabola until we hit the soft-landing curve from the previous 
picture, we can then turn on the rocket engine and land safely. 


h axis 








v axis 


In the second case illustrated, we miss switching curve, and hence cannot land safely on 


the moon switching once. 
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To justify our guess about the structure of the optimal control, let us now find the 
costate p(-) so that a(-) and x(-) described above satisfy (ODE), (ADJ), (M). To do this, 
we will have have to figure out appropriate initial conditions 


p` (0) = A1, p*(0) = Az, p*(0) = As. 
We solve (ADJ) for a(-) as above, and find 


pi(t)=A1 (0<t<r) 
p° (t) =A2— Ait (0<t<r) 
A3 (0<t<t*) 
PE i A3 + fi às y ds (Perr) 
3 tT (mo+k(T=s))? =“ j 


Define 


then 








Choose A; < 0, so that r is decreasing. We calculate 


(A2 — Ait*) 


= 


+A3k 
and then adjust A2, A3 so that r(t*) = 0. 
Then r is nonincreasing, r(t*) = 0, and consequently r > 0 on [0,t*), r < 0 on (¢*, 7]. 
But (M) says 
1 ifr(t)<0 
Ba 
0 ifr(t)>0. 


Thus an optimal control changes just once from 0 to 1; and so our earlier guess of a(-) 











does indeed satisfy the Pontryagin Maximum Principle. 





4.5 MAXIMUM PRINCIPLE WITH TRANSVERSALITY CONDITIONS 
Consider again the dynamics 
(ODE) x(t) = f(x(t), a(t)) (t >0) 


In this section we discuss another variant problem, one for which the intial position is 
constrained to lie in a given set Xg C R” and the final position is also constrained to lie 














within a given set Xı C R”. 
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(P) 


where 7 = T[a(-)] is the first time we hit X4. 


NOTATION. We will assume that Xo, X41 





are in fact smooth surfaces in R”. We let 


To denote the tangent plane to Xo at x°, and T; the tangent plane to X} at zt. 


THEOREM 4.5 (MORE TRANSVERSALITY CONDITIONS). Let a*(-) and 


x*(-) solve the problem above, with 
x? = x* (0), x} 


Then there exists a function p*(-) : [0,7*] >| 
forO<t<r*. In addition, 


p*(r*) is perpendi 
(1) { 


We call (T) the transversality conditions. 


=e 7"). 





R”, such that (ODE), (ADJ) and (M) hold 


cular to T), 


p*(0) is perpendicular to To. 


REMARKS AND INTERPRETATIONS. (i) If we have T > 0 fixed and 


T 
Pla(-)] = J r(x(t), ex(t)) e: 


then (T) says 


p*(T) = Vg( 
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x*(T)), 


in agreement with our earlier form of the terminal/transversality condition. 


(ii) Suppose that the surface X, is the graph X; = {x | g(x) =0, k =1,...,1}. Then 
(T) says that p*(7*) belongs to the “orthogonal complement” of the subspace Tı. But 
orthogonal complement of T; is the span of Vgz(x') (k = 1,...,1). Thus 


I 
p*(r*) = So eV gn (2") 
k=1 











for some unknown constants A1,..., Al. 





4.6 MORE APPLICATIONS 
4.6.1 EXAMPLE 1: DISTANCE BETWEEN TWO SETS. As a first and simple 
example, let 


(ODE) x(t) = a(t) 


for A = St, the unit sphere in R?: a € St if and only if |a|? = a? +a3 = 1. In other words, 
we are considering only curves that move with unit speed. 
We take 


Pla(-)] = z |x(t)| dt = — the length of the curve 


(P) 7 
= -j dt = — time it takes to reach X1. 
0 


We want to minimize the length of the curve and, as a check on our general theory, will 
prove that the minimum is of course a straight line. 


Using the maximum principle. We have 
H(a,p,a) = f (x,a) -p +r(x,a) 
=a-p—1=p 1a; + p2a2 — 1. 
The adjoint dynamics equation (ADJ) says 
p(t) = -V:H (x(t), p(t), a(t)) = 0, 


and therefore 
p(t) = constant = p° Æ 0. 


The maximization principle (M) tells us that 


H (x(t), p(t), a(t)) = max[—1 + pyar + p3a2]. 
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The right hand side is maximized by a? = Par: a unit vector that points in the same 
direction of p®. Thus a(-) = a? is constant in time. According then to (ODE) we have 
x = a, and consequently x(-) is a straight line. 

Finally, the transversality conditions say that 


(T) p(0) L To, plti) L Ti. 


In other words, p? L Ty and p? L Tı; and this means that the tangent planes To and Tı 
are parallel. 





Now all of this is pretty obvious from the picture, but it is reassuring that the general 











theory predicts the proper answer. 





4.6.2 EXAMPLE 2: COMMODITY TRADING. Next is a simple model for the 
trading of a commodity, say wheat. We let T be the fixed length of trading period, and 
introduce the variables 


a(t) = money on hand at time t 


( 
x(t) = amount of wheat owned at time t 
a(t) = rate of buying or selling of wheat 
q(t) = price of wheat at time t (known) 


A = cost of storing a unit amount of wheat for a unit of time. 


We suppose that the price of wheat q(t) is known for the entire trading period 0 < t < T 
(although this is probably unrealistic in practice). We assume also that the rate of selling 
and buying is constrained: 

ja(t)| < M, 


where a(t) > 0 means buying wheat, and a(t) < 0 means selling. 
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Our intention is to maximize our holdings at the end time T, namely the sum of the 
cash on hand and the value of the wheat we then own: 


(P) Pla(:)] = 2'(L) + g(T)2*(7). 


The evolution is 
‘(t) = —Ax? (t) — q(t)a(t) 
2t) = a(t). 


This is a nonautonomous (= time dependent) case, but it turns out that the Pontryagin 


(ODE) l : 


Maximum Principle still applies. 


Using the maximum principle. What is our optimal buying and selling strategy? 
First, we compute the Hamiltonian 


H(x,p,t,a) =f -ptr = pı(—Àx2 = q(t)a) + p2a, 


since r = 0. The adjoint dynamics read 


(ADJ) l n 

p’ = dp", 
with the terminal condition 
(T) p(T) = Vg(x(T)). 


In our case g(x1, £2) = z1 + q(T)x2, and hence 
(T) = 
p(T) = q(T). 


We then can solve for the costate: 


o =1 
p(t) = A(t — T) + q(T). 


The maximization principle (M) tells us that 
H (x(t), p(t), t, a(t)) = max {p° (t)(—às* (t) — q(t)a) + p” (t)a} 


FRA E R ae) 


(M) 


So 


JW Osr 
a= {ag if g(t) > p(t) 
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for p*(t) := A(t — T) + q(T). 


CRITIQUE. In some situations the amount of money on hand x?(-) becomes negative 
for part of the time. The economic problem has a natural constraint x2 > 0 (unless we can 
borrow with no interest charges) which we did not take into account in the mathematical 











model. 





4.7 MAXIMUM PRINCIPLE WITH STATE CONSTRAINTS 
We return once again to our usual setting: 


l x(t) = f(x(t),a(t)) 


(ODE) eet 


(P) Pla(.)] = | seai 


for T = T[a(-)], the first time that x(r) = xt. This is the fixed endpoint problem. 


STATE CONSTRAINTS. We introduce a new complication by asking that our 
dynamics x(-) must always remain within a given region R C R”. We will as above 
suppose that R has the explicit representation 


R= {x € R” | g(x) < OF 














for a given function g(-): R” — R. 





DEFINITION. It will be convenient to introduce the quantity 
c(x,a) := Vole) +f (aa): 


Notice that 


if x(t) € OR for times so < t < s1, then c(x(t),a(t))=0 (so <t< sı). 


This is so since f is then tangent to 0R, whereas Vg is perpendicular. 


THEOREM 4.6 (MAXIMUM PRINCIPLE FOR STATE CONSTRAINTS). Let 
a*(-),x*(-) solve the control theory problem above. Suppose also that x*(t) € OR for 
So <t< sı. 

Then there exists a costate function p*(-) : [so,sı] — R” and there exists r*(-) : 
[50,51] > R such that (ODE) holds. 

Also, for times sy < t < sı we have 


(ADSJ’) P“ (t) = —VeH(x"(t), p'(t), o +A“ (t)Vse(x* (t), a* (t)); 


and 


(M^) H(x* (t), p* (t), a* (t)) = max{ H (x* (t), p* (t), a) | c(x* (t), a) = 0}. 


To keep things simple, we have omitted some technical assumptions really needed for 
the Theorem to be valid. 


REMARKS AND INTERPRETATIONS (i) Let A C R” be of this form: 














A = {a E R” | g(a) < 0,...,gs(a) < 0} 





for given functions q1,...,qs : R™ — R. In this case we can use Lagrange multipliers to 
deduce from (M’) that 


(M") Va (x"(t), p* (t), a* (t)) = X (t) Vac(x* (t), a* (t)) + >D 1; (t) Vagi (x* (t)). 


The function A*(-) here is that appearing in (ADJ’). 

If x*(t) lies in the interior of R for say the times 0 < t < so, then the ordinary Maximum 
Principle holds then. 

(ii) Jump conditions. In the situation above, we always have 


p“ (so — 0) = p* (so + 0), 


where sọ is a time that p* hits OR. In other words, there is no jump in p* when we hit 
the boundary of the constraint OR. 
However, 
p* (sı + 0) = p*(s1 — 0) — A*(s1) Vg(x"(s1)); 


this says there is (possibly) a jump in p*(-) when we leave OR. 














4.8 MORE APPLICATIONS 
4.8.1 EXAMPLE 1: SHORTEST DISTANCE BETWEEN TWO POINTS, 
AVOIDING AN OBSTACLE. 
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What is the shortest path between two points that avoids the disk B = B(0,r), as 
drawn? 
Let us take 


(ODE) 

for A = St, with the payoff 

(P) Pla(-)} = — |x| dt = —length of the curve x(-). 
We have 


H(zx,p,a) =f-p+r=pia, + p2a2 — 1. 


Case 1: avoiding the obstacle. Assume x(t) ¢ 0B on some time interval. In this 
case, the usual Pontryagin Maximum Principle applies, and we deduce as before that 


p=-V,H =0. 
Hence 
(ADJ) p(t) = constant = p°. 
Condition (M) says 


H(x(t), p(t), a(t) = max(—1 + par + p3a2). 


0 
4. Furthermore, 


The maximum occurs for œa = pO 


—1 + play + pyag = 0; 


and therefore a-p? = 1. This means that |p°| = 1, and hence in fact a = p°. We have 
proved that the trajectory x(-) is a straight line away from the obstacle. 


Case 2: touching the obstacle. Suppose now x(t) € OB for some time interval 
so < t < sı. Now we use the modified version of Maximum Principle, provided by 
Theorem 4.6. 

First we must calculate c(x,a) = Vg(x)-f(x,a). In our case, 











R=R-Be= {ax laj+ap>r?}={e|g:=r?—-a2i— 23 <0}. 


Then Vg = ey Since f = ee) we have 
—2x2 2 





c(x,a) = —2a1 x1 — 2£202. 
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Now condition (ADJ’) implies 
p(t) = -VH + A(t) Vee; 


which is to say, 


+1 1 
p = —2da 

(4.6) i oe r 
p“ = —2dra’*. 


Next, we employ the maximization principle (M’). We need to maximize 
H (x(t), p(t), a) 


subject to the requirements that c(x(t),a) = 0 and gı (a) = a? + af — 1 = 0, since A = 
{a € R? | a? +a3 = 1}. According to (M”) we must solve 





Va H = A(t)Vac + w(t) Vag; 


that is, 
p! = X(—22') + u2at 
{ p? = A(—2x?) + p2a?. 
We can combine these identities to eliminate u. Since we also know that x(t) € OB, we 
have (x1)? + (x”)? = r?; and also a = (at, a7)” is tangent to the circle. Using these facts, 
we find after some calculations that 


2 1 1.2 
pra —p a 

4.7 A= 
( ) 2r 
But we also know 
(4.8) (at)? as) =1 
and 

H =0 = -1 + pta! + p’a?; 
hence 
(4.9) pa! +p? =1. 


Solving for the unknowns. We now have the five equations (4.6) — (4.9) for the 
five unknown functions p!,p?,a!,œ?, A that depend on t. We introduce the angle 0, as 
illustrated, and note that a = rå. A calculation then confirms that the solutions are 


{ at(0) = — sin 
a? (0) = cos 9, 
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_k+0 


A= 





2r ’ 


and 
ae = k cos 0 — sin 0 + 0 cos 0 


p° (0) = ksin + cos 0 + 0 sin 0 


for some constant k. 


Case 3: approaching and leaving the obstacle. In general, we must piece together 
the results from Case 1 and Case 2. So suppose now x(t) € R = R? — B for 0 < t < so 
and x(t) € OB for so < t< sı. 

We have shown that for times 0 < t < so, the trajectory x(-) straight line. For this case 
we have shown already that p = a and therefore 


{ pt = — cos ġo 
p? = sin ġo, 
for the angle dọ as shown in the picture. 
By the jump conditions, p(-) is continuous when x(-) hits B at the time so, meaning 


in this case that , 
{ k cos 09 — sin fo + ĝo cos 0o = — cos do 


k sin ĝo + cos 09 + ĝo sin ĝo = sin do. 
These identities hold if and only if 


{ k = —@ 

60 + 0 = 7 

The second equality says that the optimal trajectory is tangent to the disk B when it hits 
OB. 


We turn next to the trajectory as it leaves OB: see the next picture. We then have 


{ p' (0; ) = —0o cos 01 — sin 6; + 01 cos 01 


p?(0; ) = —0o sin 01 + cos 61 + 01 sin 01. 
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Now our formulas above for \ and k imply \(6,) = 2=%. The jump conditions give 
2 


r . 





P97) = P(A ) — A(A1) Vg(x(01)) 


for g(x) = r? — x? — x2. Then 


sin 0, 


A( 01) ValectOa)) = (a = 60) ( O85") 





Therefore 


{ p! (67) = —sin0, 
p*(67) = cos 4, 





and so the trajectory is tangent to OB. If we apply usual Maximum Principle after x(-) 
leaves B, we find 


p! = constant = — cos Q1 
p? = constant = — sin 1. 


Thus 
l — cos ġı = — sin 01 


— sind; = cos 01 
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and so ġı +01 = 7. 


CRITIQUE. We have carried out elaborate calculations to derive some pretty obvious 
conclusions in this example. It is best to think of this as a confirmation in a simple case 











of Theorem 4.6, which applies in far more complicated situations. 





4.8.2 AN INVENTORY CONTROL MODEL. Now we turn to a simple model 
for ordering and storing items in a warehouse. Let the time period T > 0 be given, and 
introduce the variables 


x(t) = amount of inventory at time t 
a(t) 
d(t) 


rate of ordering from manufacturers, a > 0, 


customer demand (known) 


y = cost of ordering 1 unit 
b = cost of storing 1 unit. 


Our goal is to fill all customer orders shipped from our warehouse, while keeping our 
storage and ordering costs at a minimum. Hence the payoff to be maximized is 


T 
(P) Pla] =- | yalt) + Belt) dt. 
0 
We have A = [0, o0) and the constraint that x(t) > 0. The dynamics are 


l ilt) = a(t) — d(t) 


(PDE) x(0) =2° > 0. 


Guessing the optimal strategy. Let us just guess the optimal control strategy: we 
should at first not order anything (a = 0) and let the inventory in our warehouse fall off 
to zero as we fill demands; thereafter we should order just enough to meet our demands 


(a= d). 


X axis 





SO t axis 
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Using the maximum principle. We will prove this guess is right, using the Maximum 
Principle. Assume first that x(t) > 0 on some interval [0,59]. We then have 


A(x,p, t, a) = (a = d(t))p — ya — Bx; 
and (ADJ) says p = —V,H = 3. Condition (M) implies 
BE) PU) Ot) st) = meee BE) Eye Sa) 


= —Bar(t) — p(t)d(t) + max{a(p(t) — 7)}. 


Thus 
0 if p(t) <7 
a(t) = 
+oo if p(t) > +. 
If a(t) = +oo on some interval, then Pla(-)] = +00, which is impossible, because there 


exists a control with finite payoff. So it follows that a(-) = 0 on [0, sọ]: we place no orders. 
According to (ODE), we have 
x(t) = —d(t) (0<t< so) 
x(0) = 2°. 


Thus sọ is first time the inventory hits 0. Now since x(t) = x° — le d(s) ds, we have 
a(so) = 0. That is, fọ? d(s)ds = x° and we have hit the constraint. Now use Pontryagin 
Maximum Principle with state constraint for times t > so 


R= {x > 0} = {g(@) := =x < 0} 


and 
c(z,a,t) = Vg(x)- f(x,a,t) = (—1)(a — d(t)) = d(t) — a 
We have 
(M’) H (x(t), p(t), t,a(t)) = max{H(2x(t), p(t), t, a) | c(x(t),t, a) = O}. 


But c(x(t),t,u) = 0 if and only if a(t) = d(t). Then (ODE) reads 
z(t) = a(t) — d(t) =0 


and so x(t) = 0 for all times t > sọ. 











We have confirmed that our guess for the optimal strategy was right. 





4.9 REFERENCES 


We have mostly followed the books of Fleming-Rishel [F-R], Macki-Strauss [M-S] and 
Knowles [K]. 
Classic references are Pontryagin—Boltyanski-Gamkrelidze—Mishchenko [P-B-G-M] and 
Lee—Markus [L-M]. 
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CHAPTER 5: DYNAMIC PROGRAMMING 


5.1 Deviation of Bellman’s PDE 

5.2 Examples 

5.3 Relationship with Pontryagin Maximum Principle 
5.4 References 


5.1 DERIVATION OF BELLMAN’S PDE 


5.1.1 DYNAMIC PROGRAMMING. We begin with a general mathematical princi- 
ple: “it is sometimes easier to solve a problem by embedding it in a larger class of problems 
and then solving the larger class all at once.” 


A CALCULUS EXAMPLE. Suppose we wish to calulate the value of the integral 


Ee et 
sin x 
dz. 
0 x 


This is pretty hard to do directly, so let us as follows add a parameter a into the integral: 


I(a) a enor SES ae. 
0 








x 


We compute 


T'(aœa) = f (—r)e 70r 22T dx = -j singz e °* dx = — 
0 0 


x a? +1’ 
where we integrated by parts twice to find the last equality. Consequently 
I(a) = — arctan a + C, 
and we must compute the constant C. To do so, observe that 


0 = I(œ) = —arctan(oco) + C = ne +C, 





2 
and so C = 3. Hence (a) = — arctan a + 5, and consequently 
° sing T 
dz = I (0) = —. 
f x Š (0) 2 














We want to adapt some version of this idea to the vastly more complicated setting of 
control theory. For this, fix a terminal time T > 0 and then look at the controlled dynamics 


{ x(s) = f(x(s), a(s)) (0<s<T) 


(ODE) a 


73 


with the associated payoff functional 


T 
(P) Pla(-)] - | r(x(s), a(s)) ds + g(x(T)). 


We embed this into a larger family of similar problems, by varying the starting times 
and starting points: 


(5.1) { x(s) =f(x(s),a(s))  (t<s <T) 


x(t) zt. 


with 
T 
(5.2) P,4[a(-)] = f r(x(s),a(s)) ds + g(x(T)). 


Consider the above problems for all choices of starting times 0 < t < T and all initial 





points x € R”. 





DEFINITION. For x € R”, 0 < t < T, define the value function v(x,t) to be the 
greatest payoff possible if we start at x € R” at time t. In other words, 














(5.3) v(x,t):= sup P,+{a(-)] (xe R”,0<t<T). 
al )EA 





Notice then that 





(5.4) v(x, T) = g(x) (x € R”). 











5.1.2 DERIVATION OF HAMILTON-JACOBI-BELLMAN EQUATION. Our 
first task is to show that the value function v satisfies a certain nonlinear partial differential 
equation. 

Our derivation will be based upon the reasonable principle that “it’s better to be smart 
from the beginning, than to be stupid for a time and then become smart”. We want to 
convert this philosophy of life into mathematics. 


To simplify, we hereafter suppose that the set A of control parameter values is compact. 


THEOREM 5.1 (HAMILTON-JACOBI-BELLMAN EQUATION). Assume that 
the value function v is a C! function of the variables (x,t). Then v solves the nonlinear 
partial differential equation 





(HJB) velz, t) + max {f(z,a)-Vav(z,t)+r(z,a)} =0 (c €R",0<t<T), 
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with the terminal condition 











v(x, T) = g(x) (x € R”). 





REMARK. We call (HJB) the Hamilton-Jacobi-Bellman equation, and can rewrite it as 














(HJB) vlz, t) + H(z,Vzv)=0 (x ER",0<t<T), 
for the partial differential equations Hamiltonian 


H(z, p) = max H (x, p, a) = ary {f (a, a) “P +r(x,a)} 











where zx, p € R”. 








Proof. 1. Let x € R”, 0 <t< T and let h > 0 be given. As always 
A = {a(-) : [0, œ) — A measurable}. 


Pick any parameter a € A and use the constant control 


for times t < s < t+ h. The dynamics then arrive at the point x(t + h), where t+h <T. 
Suppose now a time t + h, we switch to an optimal control and use it for the remaining 
timest+h<s<T. 

What is the payoff of this procedure? Now for times t < s < t + h, we have 


{ x(s) = f(x(s), a) 
X(t) =a 


The payoff for this time period is f Ai r(x(s),a)ds. Furthermore, the payoff incurred from 
time t+ h to T is v(x(t + h),t + h), according to the defintion of the payoff function v. 
Hence the total payoff is 


t+h 
f Het aaa. 
t 
But the greatest possible payoff if we start from (x,t) is v(x,t). Therefore 


t+h 
(5.5) u(x,t) > i r(x(s),a)ds+u(x(t+h),t +h). 
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2. We now want to convert this inequality into a differential form. So we rearrange 
(5.5) and divide by h > 0: 


u(x(t+h),é+h)—v(z,t) 1 
h j h 





t+h 
f r(x(s),a)ds < 0. 


Let h — 0: 
uz(a, t) + Vzrvlx(t), t) - x(t) + r(x(t),a) <0. 


But x(-) solves the ODE 


{ x(s) = f(x(s),a) (t<s<t+h) 


Employ this above, to discover: 


vlz, t) +f(x,a)- Vazv(a,t) + r(az,a) <0 
This inequality holds for all control parameters a € A, and consequently 


(5.6) max {ui(x,t) + f(x,a)- Vzvlz,t)+r(z,a)} <0. 


3. We next demonstrate that in fact the maximum above equals zero. To see this, 
suppose a*(-), x*(-) were optimal for the problem above. Let us utilize the optimal control 
a*(-) fort <s <t+h. The payoff is 


t+h 
f EO) 
t 
and the remaining payoff is v(x* (t + h),t + h). Consequently, the total payoff is 
t+h 
f a) ds + v(x EAA), t+h) = vlet), 
t 


Rearrange and divide by h: 


v(x*(t+ h),t +h) — v(x, t) T 1 
h h 





t+h 
/ r(x*(s),a*(s)) ds = 0. 
t 
Let h — 0 and suppose a*(0) = a* € A. Then 
uz(a,t) + Vav(a,t)- x*(t) + r(a,a*) = 0; 
Sa 


f(z,a*) 
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and therefore 
vlz, t) + f(x, a*)- Vzvlz, t) + r(x, a*) =0 











for some parameter value a* € A. This proves (HJB). 





5.1.2 THE DYNAMIC PROGRAMMING METHOD 
Here is how to use the dynamic programming method to design optimal controls: 


Step 1: Solve the Hamilton—Jacobi-Bellman equation, and thereby compute the value 


function v. 


Step 2: Use the value function v and the Hamilton—Jacobi-Bellman PDE to design an 
optimal feedback control a*(-), as follows. Define for each point x € R” and each time 
0<t<T, 

alx, t) =ac€A 
to be a parameter value where the maximum in (HJB) is attained. In other words, we 
select a(x, t) so that 


vlz, t) + f(x,alzx,t))- Vzvlx,t)+r(x,aœlx,t)) = 0. 


Next we solve the following ODE, assuming a(.,t) is sufficiently regular to let us do so: 


(ODE) l x*(s) = f(x* (s), a(x*(s),s)) (t <s<T) 


Kt) a 


Finally, define the feedback control 
(5.7) a*(s) := a(x*(s), 8). 


In summary, we design the optimal control this way: If the state of system is x at time f, 
use the control which at time t takes on the parameter value a € A such that the minimum 
in (HJB) is obtained. 

We demonstrate next that this construction does indeed provide us with an optimal 


control. 


THEOREM 5.2 (VERIFICATION OF OPTIMALITY). The control a*(-) defined 
by the construction (5.7) is optimal. 


Proof. We have 


Furthermore according to the definition (5.7) of a(-): 


T 
P, +la*(-)] = f (—u;(x"(s), 8) — £(x* (s), a*(s)) - Vev(x"(s), s)) ds + g(x*(T)) 
T 
= a GEODATA) E E E E) 


wa 
=- f Solas), 8) ds + 9(x*(T)) 
*(T),T) + v(x* (t), t) + g(x*(T)) 
= —g(x*(T)) + v(x* (t), t) + g(x* (T)) 


=v(x,t)= sup Pz [a(-)]. 
a(jEA 


—u(x 


That is, 














and so a*(-) is optimal, as asserted. 


5.2 EXAMPLES 
5.2.1 EXAMPLE 1: DYNAMICS WITH THREE VELOCITIES. Let us begin 
with a fairly easy problem: 


(ODE) { t(s) ae (0<t<s<1) 


where our set of control parameters is 


A = {-1,0,1}. 


f Toe 


We want to minimize 


and so take for our payoff functional 


(P) P, ila()] = -f |x(s)| ds. 


As our first illustration of dynamic programming, we will compute the value function 
u(x,t) and confirm that it does indeed solve the appropriate Hamilton-Jacobi-Bellman 


equation. To do this, we first introduce the three regions: 
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t=1 


X axis 


(1,x-1+t) 





t axis 


OPTIMAL PATH IN REGION III 


e Region J = {(z,t)|a%#<t-—10<t< 1}. 
e Region TI = {(z,t) |t-l<a<1-t,0<t< 1}. 
e Region ITI = {(z,t)| 2 >1-t,0<t< 1}. 


We will consider the three cases as to which region the initial data (x,t) lie within. 


Region III. In this case we should take a = —1, to steer as close to the origin 0 as 


quickly as possible. (See the next picture.) Then 


(=t) 
2 





1 
u(x,t) = — area under path taken = —(1 — t)5 (2 +x2+t—-l1)=- (Qc +t- 1). 


Region I. In this region, we should take a@ = 1, in which case we can similarly compute 
v(a,t) = — (45%) (2r +t — 1). 


Region II. In this region we take a = +1, until we hit the origin, after which we take 


a = 0. We therefore calculate that v(x, t) = ae. in this region. 
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X axis 


(t,x) 





(t+x,0) 


SS... SS... = 


t=1 t axis 


OPTIMAL PATH IN REGION II 


Checking the Hamilton-Jacobi-Bellman PDE. Now the Hamilton-Jacobi-Bellman 
equation for our problem reads 


(5.8) Ut + max { f -Vzu+r}=0 
for f =a, r = —|z|. We rewrite this as 


ve + max {avr} — |x| = 0; 
a=+1,0 





and so 
(HJB) vi + |ve| — |x| = 0. 


We must check that the value function v, defined explicitly above in Regions I-III, does in 
fact solve this PDE, with the terminal condition that v(x, 1) = g(x) = 0. 


r2 


Now in Region II v = -5, 


vz = 0, vz = —x. Hence 
ve + |vz| — |z| =0+|—2|-—|xz|]=0 in Region II, 


in accordance with (HJB). 
In Region III we have 





and therefore 





=g—-1l+t, vz,=t-1, jf-lj=1-t20. 
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1 (i=4) 
= [eiil = 
Ut 5 (2a + ) 9 


Hence 
uz + |vg| — |z| =2—1+¢t+ |¢-—1)-—|cz|=0 in Region II, 


because x > 0 there. 
Likewise the Hamilton-Jacobi-Bellman PDE holds in Region I. 


REMARKS. (i) In the example, v is not continuously differentiable on the borderlines 
between Regions II and I or III. 

(ii) In general, it may not be possible actually to find the optimal feedback control. 
For example, reconsider the above problem, but now with A = {—1,1}. We still have 











a = sgn(v,), but there is no optimal control in Region II. 





5.2.2 EXAMPLE 2: ROCKET RAILROAD CAR. Recall that the equations of 


motion in this model are 
tı 0 1 £i 0 
= < 
(a) =C o)(a)+G)= es: 


Px [a(-)] = — time to reach (0,0) = -j ldt = —r. 
0 


and 


To use the method of dynamic programming, we define v(x1, £2) to be minus the least 
time it takes to get to the origin (0,0), given we start at the point (x1, £2). 


What is the Hamilton-Jacobi-Bellman equation? Note v does not depend on t, and so 
we have 
max{f-V,v+r}=0 
acA 


for 


A=[-1,1], f= ea! r=-1 


a 


Hence our PDE reads 


max {T2Vz; + avz, — 1} = 0; 
la|<1 


and consequently 














a Foza t= LS in R? 
(HJB) { QU 1 |v 2l 


v(0,0) = 0. 


Checking the Hamilton-Jacobi-Bellman PDE. We now confirm that v really sat- 
isfies (HJB). For this, define the regions 


1 1 
I := { (£1, £2) | £1 > =z572lz2|} and II := {(xz1, £2) | z1 < =572ļz2]}. 
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A direct computation, the details of which we omit, reveals that 


1 
Z in Region I 


NR 


v(z) = — #2 — 2 (21 + 523) 


LQ — 2 (—21 + 503) in Region II. 


In Region I we compute 


and therefore 
2-3 
T2 
T2Urı + [Uz —-l= — T2 (= + z) + 


sos 
+m (n+ 2) —1=0. 


This confirms that our (HJB) equation holds in Region I, and a similar calculation holds 





in Region II. 


Optimal control. Since 


nas {r2Ve, + ave, +1} =0, 


the optimal control is 


Q = sgn Vgs- 














5.2.3 EXAMPLE 3: GENERAL LINEAR-QUADRATIC REGULATOR 
For this important problem, we are given matrices M, B,D €e M”*”, N e M”"*™, 
C € M™*™: and assume 


B,C, D are symmetric and nonnegative definite, 


and 


C is invertible. 


We take the linear dynamics 


(ODE) 


for which we want to minimize the quadratic cost functional 
T 
f xr(s)" Bx(s) + a(s)’Ca(s) ds + x(T)* Dx(T). 
t 
So we must maximize the payoff 
T 
(P) P, +la(-)] = -f a(s)" Bx(s) + a(s)” Ca(s) ds — x(T)" Dx(T). 
t 


The control values are unconstrained, meaning that the control parameter values can range 
over all of A = R”. 














We will solve by dynamic programming the problem of designing an optimal control. 
To carry out this plan, we first compute the Hamilton-Jacobi-Bellman equation 


vz + max {f- Vv +r} =0, 
acER™ 


where 
f= Mx+Na 
r = —rT Br —a™Ca 
g = =x De. 
Rewrite: 
(HJB) vg + max {(Vv)" Na — a” Ca} + (Vv)? Mz — xz" Bx = 0. 
ack” 


We also have the terminal condition 
v(x,T) = =x" Dx 


Minimization. For what value of the control parameter a is the minimum attained? 
To understand this, we define Q(a) := (Vv)’Na—a‘Ca, and determine where Q has 
a minimum by computing the partial derivatives Qa, for j = 1,...,m and setting them 
equal to 0. This gives the identitites 


Qa; = Y Vestas ae 20;Cij = 0. 
i=1 
Therefore (Vv)? N = 2a7C, and then 2C7a = N? Vv. But CT = C. Therefore 


1 
a= ONT Va. 
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This is the formula for the optimal feedback control: It will be very useful once we compute 
the value function v. 


Finding the value function. We insert our formula a = $C~!N7Vv into (HJB), 
and this PDE then reads 


{ ve + 4(Vv)? NC NT Vv + (Vv)? Mz — 27 Bx =0 


ee) v(x, T) = -xT Dx 


Our next move is to guess the form of the solution, namely 
v(x, t) = xT K (t)z, 


provided the symmetric n x n-matrix valued function K(-) is properly selected. Will this 
guess work? 
Now, since zT K(T)x = —v(x,T) = xT Dz, we must have the terminal condition that 


K(T) =—D. 


Next, compute that 
v = £T K(t)e, Vav = 2K (t)z. 


We insert our guess v = xT K(t)x into (HJB), and discover that 
a? {K(t) + K(t)NC7'N’ K(t) + 2K(t)M — Bla =0. 

Look at the expression 

2a? K Mx =x" KMex + |r" KM}? 

=a'KM2x+2"M' Kr. 
Then 
a? {K +KNC1NTK+KM+M'K — Bhar =0. 

This identity will hold if K(-) satisfies the matrix Riccati equation 


H KO +KENCONTK(t)+ K()M+M7TK(t)—B=0 (0<t<T) 
(R) i K(T)= -D 


In summary, if we can solve the Riccati equation (R), we can construct an optimal 
feedback control 
a* (t) = C71N? K(t)x(t) 


Furthermore, (R) in fact does has a solution, as explained for instance in the book of 
Fleming-Rishel [F-R]. 
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5.3 DYNAMIC PROGRAMMING AND THE PONTRYAGIN MAXIMUM 
PRINCIPLE 


5.3.1 THE METHOD OF CHARACTERISTICS. 
Assume H : R” x R” — R and consider this initial-value problem for the Hamilton- 




















Jacobi equation: 














(HJ) l u(x,t) + H(x, Vsu(z,t))=0  (ze€R”,0<t<T) 


u(x, 0) = g(x). 

A basic idea in PDE theory is to introduce some ordinary differential equations, the 
solution of which lets us compute the solution u. In particular, we want to find a curve 
x(-) along which we can, in principle at least, compute u(x, t). 

This section discusses this method of characteristics, to make clearer the connections 
between PDE theory and the Pontryagin Maximum Principle. 


NOTATION. 


Derivation of characteristic equations. We have 


p" (t) = Urk (x(t), t), 


and therefore 


p*(t) = ue,e(x(t), t) + Yo tors (t), t)i. 


Now suppose u solves (HJ). We differentiate this PDE with respect to the variable xp: 


Ute, (z, t) = —Hz, (x, Vu(2,t)) — X Hp, (a, Vu(2,t)) - tage; (a, t). 


i=1 
Let x = x(t) and substitute above: 


n 


p*(t) = —Hz, (x(t), Veu(x(t), t)) + N (a(t) — Ay, (x(t), V2u(2, t) Ue, «, (x(t), t). 
p(t) ae p(t) 


We can simplify this expression if we select x(-) so that 


i(t) = Hp,(p(t),x(t)), (1s i<n); 
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then 
p" (t) = -Hr (p(t), x), (1<k <n). 


These are Hamilton’s equations, already discussed in a different context in §4.1: 


{ x(t) = VH (p(t), x(¢)) 


(H) ene 
p(t) = -V:H (p(t), x(t). 


We next demonstrate that if we can solve (H), then this gives a solution to PDE (HJ), 
satisfying the initial conditions u = g on t = 0. Set p? = Vg(x°). We solve (H), with 
x(0) = x° and p(0) = p°. Next, let us calculate 


L uxt), t) = u(x(t), t) + Vrulx(t), t) X(t) 
= —H(V,,u(x(t), t), x(t)) + V..u(x(t), t) - VpH (p(t), x(t)) 
p(t) p(t) 
= —H (p(t), x(t)) + p(t) - Vp H (p(t), x(t)) 


Note also u(x(0),0) = u(x?,0) = g(x?). Integrate, to compute u along the curve x(-): 
t 
u(x(t), t) = f -H + VH - pds + g(x?) 
0 


which gives us the solution, once we have calculated x(-) and p(-). 


5.3.2 CONNECTIONS BETWEEN DYNAMIC PROGRAMMING AND THE 
PONTRYAGIN MAXIMUM PRINCIPLE. 
Return now to our usual control theory problem, with dynamics 


X(s) =f(x(s),a(s))  (¢<s<T) 
(ODE) l SoS 
and payoff 
T 
(P) Pr tla(-)] = f r(x(s), a(s)) ds + g(x(T)) 


As above, the value function is 


v(x, t) = sup Ps ela(-)]. 
a(:) 


The next theorem demonstrates that the costate in the Pontryagin Maximum Principle 


is in fact the gradient in x of the value function v, taken along an optimal trajectory: 
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THEOREM 5.3 (COSTATES AND GRADIENTS). Assume a*(-), x*(-) solve the 
control problem (ODE), (P). 
If the value function v is C?, then the costate p*(-) occuring in the Maximum Principle 
is given by 
p*(s) = Vzv(x* (s), s) (t<s<T). 


Proof. 1. As usual, suppress the superscript *. Define p(t) := V,v(x(t), t). 
We claim that p(-) satisfies conditions (ADJ) and (M) of the Pontryagin Maximum 
Principle. To confirm this assertion, look at 


d 


P (E) = Fei (x(t), t) = vaxt) t) + D eee, (x(t), t)i (t). 


We know v solves 
Uz(a, t) + max{f (a, a): Vzvu(a,t) + r(x,a)} = 0; 
ac 
and, applying the optimal control a(-), we find: 


vuz(x(t), t) + f(x(t), a(t)) - Vev(x(t), t) + r(x(t), a(t)) = 0. 


2. Now freeze the time t and define the function 
h(x) := v(x, t) + f(z, a(t))- Vev(a,t) + r(x, a(t)) <0. 


Observe that h(x(t)) = 0. Consequently h(-) has a maximum at the point x = x(t); and 
therefore for i = 1,...,n, 


0 = he, (X(t) = Vta: (x(t), t) + fe; (x(t), a(¢)) - Vovl(x(), t) 
+ £(x(1), alt)) - Veo, (x(t), t) + re, (x), a(t)). 


Substitute above: 


p'(t) = Vat ap >R Vrixj f; = Va ;t F f. Vide. = =fr i Vzv — fr; 
i=1 


Recalling that p(t) = Vzv(x(t), t), we deduce that 
p(t) = —(Vaf)p — Var. 
Recall also 


H=f-pt+r, VH = (Vzf)p + Ver. 
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Hence 
p(t) = -V:H (p(t), x(t)), 


which is (ADJ). 


3. Now we must check condition (M). According to (HJB), 


ve(x(t), t) + max{f (x(t), a) -Vox(4), t) + rx@), t)} = 0, 
ae —_—_—_—_ 
p(t) 


and maximum occurs for a = a(t). Hence 


max{H (x(t), p(t),a)} = H (x(t), p(t), a(t); 











and this is assertion (M) of the Maximum Principle. 





INTERPRETATIONS. The foregoing provides us with another way to look at transver- 
sality conditions: 

(i) Free endpoint problem: Recall that we stated earlier in Theorem 4.4 that for the 
free endpoint problem we have the the condition 


(T) p*(T) = Vg(x"(T)) 
for the payoff functional 
T 
[ r) a(8)) as + o(x(7)). 


To understand this better, note p*(s) = —Vv(x* (s), s). But u(x,t) = g(x), and hence the 
foregoing implies 
p*(T) = Vzv(x* (T), T) = Vg(x* (T)). 


(ii) Constrained initial and target sets: 


Recall that for this problem we stated in Theorem 4.5 the transvelsality conditions that 


(T) l p*(0) is perpendicular to To 
p*(r*) is perpendicular to Tı 


when 7* denotes the first time the optimal trajectory hits the target set Xj. 


Now let v be the value function for this problem: 


v(x) = sup P,[a(-)], 
a() 
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with the constraint that we start at x? € Xo and end at zt € X, But then v will be 
constant on the set Xo and also constant on X1. Since Vv is perpendicular to any level 
surface, Vv is therefore perpendicular to both 0X9 and 0X. And since 


p*(t) = Vo(x*(¢)), 


this means that 


* 


{ p* is perpendicular to Xo at t = 0, 
p* is perpendicular to 0X; at t= 7*. 














5.4 REFERENCES 


See the book [B-CD] by M. Bardi and I. Capuzzo-Dolcetta for more about the modern 
theory of PDE methods in dynamic programming. Barron and Jensen present in [B-J] a 
proof of Theorem 5.3 that does not require v to be C?. 
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CHAPTER 6: DIFFERENTIAL GAMES 


6.1 Definitions 

6.2 Dynamic Programming 

6.3 Games and the Pontryagin Maximum Principle 
6.4 Application: war of attrition and attack 

6.5 References 


6.1 DEFINITIONS 


We introduce in this section a model for a two-person, zero-sum differential game. The 
basic idea is that two players control the dynamics of some evolving system, and one tries 
to maximize, the other to minimize, a payoff functional that depends upon the trajectory. 

What are optimal strategies for each player? This is a very tricky question, primarily 
since at each moment of time, each player’s control decisions will depend upon what the 
other has done previously. 





A MODEL PROBLEM. Let a time 0 < t < T be given, along with sets A C R”, 
B C R! and a function f : R” x Ax B — R”. 



































DEFINITION. A measurable mapping a(-) : [t, T] — A is a control for player J (starting 
at time t). A measurable mapping ((-) : lt, T] — B is a control for player IT. 


Corresponding to each pair of controls, we have corresponding dynamics: 


(s) =f(x(s),a(s),G(s))  (t<s<T) 


oak = 





the initial point x € R” being given. 


DEFINITION. The payoff of the game is 


T 
(P) Prtla(-), BC)] =| r(x(s), a(s), B(s)) ds + g(x(T)). 


Player I, whose control is a(-), wants to maximize the payoff functional P|]. Player IJ 
has the control G(-) and wants to minimize P|-|. This is a two-person, zero-sum differential 
game. 


We intend now to define value functions and to study the game using dynamic program- 
ming. 
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DEFINITION. The sets of controls for the game of the game are 
A(t) := {a(-) : [t, T] — A, a(-) measurable} 
B(t) := {B(-) : [t, T] — B, B(-) measurable}. 
We need to model the fact that at each time instant, neither player knows the other’s 
future moves. We will use concept of strategies, as employed by Varaiya and Elliott— 


Kalton. The idea is that one player will select in advance, not his control, but rather his 
responses to all possible controls that could be selected by his opponent. 


DEFINITIONS. (i) A mapping ® : B(t) > A(t) is called a strategy for player I if for 
all times t < s < T, 
B(r)=B(r) frt<r<s 
implies 
(6.1) SBIT) = O[B|(r) fort<r< s. 
We can think of [6] as the response of player I to player IJ’s selection of control 6(-). 


Condition (6.1) expresses that player J cannot foresee the future. 
(ii) A strategy for II is mapping Y : M(t) — N (t) such that for all times t < s <T, 


a(r) =â(T) frt<r<s 
implies 
Vlal(7) = Vlal(7) fort<r<s. 
DEFINITION. The sets of strategies are 
A(t) := strategies for player J (starting at t) 
B(t) := strategies for player IJ (starting at t). 


Finally, we introduce value functions: 
DEFINITION. The lower value function is 


(6.2) v(x, t) = one ae Pz +[a(-), Ula](-)], 


and the upper value function is 


(6.3) u(x,t):= sup inf Pre(®[B](-), 60]. 


eA B(:)EB(t) 


One of the two players announces his strategy in response to the other’s choice of control, 
the other player chooses the control. The player who “plays second”, i.e., who chooses the 
strategy, has an advantage. In fact, it turns out that always 


v(x, t) < u(x,t). 
6.2 DYNAMIC PROGRAMMING, ISAACS’ EQUATIONS 
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THEOREM 6.1 (PDE FOR THE UPPER AND LOWER VALUE FUNC- 
TIONS). Assume u,v are continuously differentiable. Then u solves the upper Isaacs’ 


equation 


(6.4) { ut + minpeg Maxge A{f(z, a,b): Veu(x,t) + r(x, a, b)} =0 


u(x, T) = g(x), 


and v solves the lower Isaacs’ equation 


(6.5) { Uz + MaxgeA Minne B{f(zx,a,b)- Vzvu(a2,t) + r(x, a,b)} =0 


v(x, T) = g(x). 


Isaacs’ equations are analogs of Hamilton-Jacobi-Bellman equation in two-person, 
zero-sum control theory. We can rewrite these in the forms 


us + H* (2, Ve2u) =0 
for the upper PDE Hamiltonian 


+ oa : : 
H (x, p) a min max{f(z, a, b) p+r(z, a, b)}; 


and 
v+ H (xz,Vzv)=0 


for the lower PDE Hamiltonian 


H (x,p) := max min{f (x, a,b) : p + r(x, a, b)}. 


INTERPRETATIONS AND REMARKS. (i) In general, we have 


max min{f (z, a,b)-p+r(a,a,b)} < min max{f (x, a,b)-p+r(az,a,b)}, 
and consequently H~(x,p) < H*(z,p). The upper and lower Isaacs’ equations are then 
different PDE and so in general the upper and lower value functions are not the same: 
u aoe 

The precise interpretation of this is tricky, but the idea is to think of a slightly different 
situation in which the two players take turns exerting their controls over short time inter- 
vals. In this situation, it is a disadvantage to go first, since the other player then knows 
what control is selected. The value function u represents are sort of “infintesimal” version 
of this situation, for which player I has the advantage. The value function v represents the 


reverse situation, for which player II has the advantage. 
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If however 


(6.6) max min{f(- --)eptr(--)p= min max{f(: -epre )}, 


for all p, x, we say the game satisfies the minimax condition, also called Isaacs’ condition. 
In this case it turns out that u = v and we say the game has value. 


(ii) As in dynamic programming from control theory, if (6.6) holds, we can solve Isaacs’ 
for u = v and then, at least in principle, design optimal controls for J and IT. 


(iii) To say that a*(-),@*(-) are optimal means that the pair (a*(-), B*(-)) is a saddle 
point for Pr. This means 


(6.7) Pr ela), BC)] < Pr sla"), BO) < Pr ele"), BO) 


for all controls a(-), G(-). Player I will select a*(-) because he is afraid IJ will play 3*(-). 
Player II will play 8*(-), because she is afraid J will play a*(-). 














6.3 GAMES AND THE PONTRYAGIN MAXIMUM PRINCIPLE 


Assume the minimax condition (6.6) holds and we design optimal a*(-), 3*(-) as above. 
Let x*(-) denote the solution of the ODE (6.1), corresponding to our controls a*(-), B*(-). 
Then define 

p“(t) := Vet (t), t) = Vzu(x* (t), t). 


It turns out that 
(ADJ) p’(t) = -V:H (x* (t), p* (t), a* t), B°(t)) 
for the game-theory Hamiltonian 


H(z, p,a, b) := f(x,a, b) -p+ r(z,a,b). 


6.4 APPLICATION: WAR OF ATTRITION AND ATTACK. 
In this section we work out an example, due to R. Isaacs [I]. 


6.4.1 STATEMENT OF PROBLEM. We assume that two opponents I and II are 
at war with each other. Let us define 


xt (t) = supply of resources for I 


x(t) = supply of resources for IT. 
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Each player at each time can devote some fraction of his/her efforts to direct attack, and 
the remaining fraction to attrition (= guerrilla warfare). Set A = B = [0,1], and define 


a(t) = fraction of Is effort devoted to attrition 
1 — a(t) = fraction of Is effort devoted to attack 

GB(t) = fraction of II’s effort devoted to attrition 
1 — G(t) = fraction of IT’s effort devoted to attack. 


We introduce as well the parameters 


mı = rate of production of war material for I 


rate of production of war material for II 


mg 


cı = effectiveness of I I’s weapons against J’s production 


C2 = effectiveness of I’s weapons against JJ’s production 
We will assume 
C2 > C1, 
a hypothesis that introduces an asymmetry into the problem. 
The dynamics are governed by the system of ODE 
{ zi (t) =m, — ca G(t)x?(t) 


(6.8) r(t) = mg — cza (t)xt (t). 


Let us finally introduce the payoff functional 


T 
Pla(), 60) = i (1 — a(t) )a*(t) — (1 — 6(t))x’ (t) dt 


the integrand recording the advantage of I over JJ from direct attacks at time t. Player I 


wants to maximize P, and player JJ wants to minimize P. 


6.4.2 APPLYING DYNAMIC PROGRAMMING. First, we check the minimax 
condition, for n = 2, p = (pı, p2): 


f(x,a,b)-p+r(a,a,b) = (mı — c,bx2)p1 
+ (Mə — c2aX1)p2 + (1 — a)xı — (1 — b)axe 
= mp1 + Mp2 + £1 — £2 + a(—21 — Cox p2) + b(£2 — C1 X21). 


Since a and 6 occur in separate terms, the minimax condition holds. Therefore v = u and 
the two forms of the Isaacs’ equations agree: 


vg + A(a2,Vev) = 0, 
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for 
H(z,p) = H*(z,p) = H(a,p). 


We recall A = B = [0,1] and p = Vv, and then choose a € [0, 1] to maximize 
azı (—1 —c2vz,). 
Likewise, we select b € [0,1] to minimize 


bg (1—c1vz,)-. 


Thus 

1 if —1— cvr, >0 
(6.9) a= l i iar 

0 if —1-— c2Ur, < 0, 
and 

0 if 1— c&u, >0 
(6.10) (= { AR 

1 if Tey ty. <0. 


So if we knew the value function v, we could then design optimal feedback controls for J, 
IT. 
It is however hard to solve Isaacs’ equation for v, and so we switch approaches. 


6.4.3 APPLYING THE MAXIMUM PRINCIPLE. Assume a(-), G(-) are se- 
lected as above, and x(-) corresponding solution of the ODE (6.8). Define 


p(t) := Vsv(x(t), t). 
By results stated above, p(-) solves the adjoint equation 
(6.11) p(t) = -V:H (x(t), p(t), a(t), B@)) 
for 


H(x,p, Q, b) =p f(x, a, b) + r(x, a, b) 
= pı (Mı — c1bz2) + po(Mə — cpax1) + (1 — a)zı — (1 — b)z2. 


Therefore (6.11) reads 


Jl =a — 1 +p coa 
(6.12) T A 


p =1 — b + ptc, 


with the terminal conditions p! (T) = p?(T) = 0. 
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We introduce the further notation 


s! := —1 — Cg, = =1 = cop”, s:=1— CoS = cıp!; 


so that, according to (6.9) and (6.10), the functions st and s? control when player J and 
player IJ switch their controls. 


Dynamics for sı and sg. Our goal now is to find ODE for st, s2. We compute 


§* = —eof” = c(8 — 1— p'eif) = co(-—1+ B(1 — p'er)) = eo(-1+4 8s”) 


and 
2 = ap! = ca (1 — a — poa) = (1+ a(-1 — p°c2)) = c (1 + ast). 
Therefore 
åt = ca(—1 + 8s?), st(T)=-—1 
(6.13) 22 Ly g2¢y 
s°=c(l+as*), s*(T)=1. 


Recall from (6.9) and (6.10) that 


r if sı > 0 
a= 


0 ifs, <0, 

1 ifs2<0 
p= . 

0 if so > 0. 


Consequently, if we can find st, s?, then we can construct the optimal controls a and 8. 


Calculating s! and s?. We work backwards from the terminal time T. Since at 
time T, we have st < 0 and s? > 0, the same inequalities holds near T. Hence we have 
a= 8 = 0 near T, meaning a full attack from both sides. 


Next, let t* < T be the first time going backward from T at which either J or IJ 
switches stategy. Our intention is to compute t*. On the time interval [t*,7], we have 
a(-) = B(-) =0. Thus (6.13) gives 


§ = —c2, s!(T) = 1, ê =c, s°(T)=1; 
and therefore 
si()==1+e(F —t), (t)=1+c(t-—-T) 


for times t* < t < T. Hence st hits 0 at time T — s? hits 0 at timeT — ai Remember 


Cc 
that we are assuming cj > cı. Then T — x <T- a and hence 


1 
fa 


C2 
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Now define tą < t* to be the next time going backward when player J or player IJ 
switches. On the time interval |t,,t*], we have a = 1, 8 = 0. Therefore the dynamics read: 


l = —c2, s'(t*)=0 
s=c(1+s'), s*(t*)=1-2 


c2 


We solve these equations and discover that 


(t <t <t). 





s(t) = —1 + c2(T — t) 
{ s*(f) =1—- 4 — E 


2c2 2 


Now st > 0 on [tx, ¢*] for all choices of t+. But s? = 0 at 


If we now solve (6.13) on [0,t.] with a = 8 = 1, we learn that s1, s2 do not change sign. 


CRITIQUE. We have assumed that xı > 0 and x2 > 0 for all times t. If either x; or zə 
hits the constraint, then there will be a corresponding Lagrange multiplier and everything 
becomes much more complicated. 














6.5 REFERENCES 


See Isaacs’ classic book [I] or the more recent book by Lewin [L] for many more worked 
examples. 
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CHAPTER 7: INTRODUCTION TO STOCHASTIC CONTROL THEORY 


7.1 Introduction and motivation 

7.2 Review of probability theory, Brownian motion 
7.3 Stochastic differential equations 

7.4 Stochastic calculus, It6 chain rule 

7.5 Dynamic programming 

7.6 Application: optimal portfolio selection 

7.7 References 


7.1 INTRODUCTION AND MOTIVATION 


This chapter provides a very quick look at the dynamic programming method in sto- 
chastic control theory. The rigorous mathematics involved here is really quite subtle, far 
beyond the scope of these notes. And so we suggest that readers new to these topics just 
scan the following sections, which are intended only as an informal introduction. 


7.1.1 STOCHASTIC DIFFERENTIAL EQUATIONS. We begin with a brief overview 
of random differential equations. Consider a vector field f : R” — R” and the associated 
ODE 














(7.1) 


T. 


{ x(t) =f(x(t))  (@>0) 
x(0) = x° 


In many cases a better model for some physical phenomenon we want to study is the 
stochastic differential equation 


(7.2) { X(t) = f(X(t)) + c€(t) (t > 0) 
X(0) = x°. 

where €(-) denotes a “white noise” term causing random fluctuations. We have switched 

notation to a capital letter X(-) to indicate that the solution is random. A solution of 

(7.2) is a collection of sample paths of a stochastic process, plus probabilistic information 

as to the likelihood of the various paths. 





7.1.2 STOCHASTIC CONTROL THEORY. Now assume f : R” x A — R” and turn 
attention to the controlled stochastic differential equation: 














{ X(s) = f(X(s), A(s)) + &(s) (t<s<T) 
X(t) = x° 


(SDE) 
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DEFINITIONS. (i) A control A(-) is a mapping of |t, T] into A, such that for each time 
t< s <T, A(s) depends only on s and observations of X(T) fort <T < s. 
(ii) The corresponding payoff functional is 


T 
(P) PrtlA()| =E 1 r(X(s), A(s)) ds + sx} , 


the expected value over all sample paths for the solution of (SDE). As usual, we are given 
the running payoff r and terminal payoff g. 


BASIC PROBLEM. Our goal is to find an optimal control A*(-), such that 


P; l A*()] = max Paa AN: 


DYNAMIC PROGRAMMING. We will adapt the dynamic programming methods 
from Chapter 5. To do so, we firstly define the value function 
v(x, t) := sup Pz +[A(-)]. 
A(-) 

The overall plan to find an optimal control A*(-) will be (i) to find a Hamilton-Jacobi- 
Bellman type of PDE that v satisfies, and then (ii) to utilize a solution of this PDE in 
designing A*. 

It will be particularly interesting to see in 87.5 how the stochastic effects modify the 
structure of the Hamilton-Jacobi-Bellman (HJB) equation, as compared with the deter- 
ministic case already discussed in Chapter 5. 


7.2 REVIEW OF PROBABILITY THEORY, BROWNIAN MOTION. 

This and the next two sections provide a very, very rapid introduction to mathematical 
probability theory and stochastic differential equations. The discussion will be much too 
fast for novices, whom we advise to just scan these sections. See §7.7 for some suggested 


reading to learn more. 


DEFINITION. A probability space is a triple (Q, F, P), where 
(i) Q is a set, 
(ii) F is a o-algebra of subsets of Q, 
(iii) P is a mapping from F into [0,1] such that P(@) = 0, P(Q) = 1, and 
P(U%,A;) = O°, P(A;), provided A; N A; = @ for all i Æ j. 


A typical point in Q is denoted “w” and is called a sample point. A set A € F is called 
an event. We call P a probability measure on Q, and P(A) € [0,1] is probability of the 


event A. 
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DEFINITION. A random variable X is a mapping X : Q — R such that for allt € R 





{w|X(w) <theF. 


We mostly employ capital letters denote random variables. Often the dependence of X 
on w is not explicitly displayed in the notation. 


DEFINITION. Let X be a random variable, defined on some probability space (Q, F, P). 
The expected value of X is 
E|X] := l X dP. 
2 


EXAMPLE. Assume Q C R”, and P(A) = f, f dw for some function f : R™ — [0, cœ), 
with Jo f dw = 1. We then call f the density of the probability P, and write “dP = fdw”. 
In this case, 





E[X] = | Xfaw. 














DEFINITION. We define also the variance 


Var(X) = B[(X - E(X))?] = E[X?] - (E[X]). 


IMPORTANT EXAMPLE. A random variable X is called normal (or Gaussian) with 
mean u, variance g? if for all -oo < a < b < oo 


_ (en)? 
202 dx, 





b 
PlasX<d=Ts fe 


We write “X is N(u,07)”. 
DEFINITIONS. (i) Two events A, B € F are called independent if 


P(AN B) = P(A)P(B). 
(ii) Two random variables X and Y are independent if 


POSS and Y < s)= P(X <t)P(Y < 8) 











for all t,s € R. In other words, X and Y are independent if for all t,s the events A = 
{X < t} and B = {Y < s} are independent. 





100 


DEFINITION. A stochastic process is a collection of random variable X(t) (0 < t < oo), 
each defined on the same probability space(Q, F, P). 


The mapping t > X(t,w) is the w-th sample path of the process. 
DEFINITION. A real-valued stochastic process W(t) is called a Wiener process or 


Brownian motion if 


(i) W(0) =9, 

(ii) each sample path is continuous, 
(iii) W(t) is Gaussian with u = 0, o? = t ka is, W(t) is N(0,t)), 
(iv) for all choices of times 0 < ty < tg < --- < tn the random variables 


W (t1), W (t2) —W(t1),---, W (tk) — W (tk-1) 
are independent random variables. 


Assertion (iv) says that W has “independent increments”. 


INTERPRETATION. We heuristically interpret the one-dimensional “white noise” €(-) 


awe) 


as equalling However, this is only formal, since for almost all w, the sample path 











t— W(t,w) is in fact nowhere differentiable. 





DEFINITION. An n-dimensional Brownian motion is 
W(t) = (W(t), W7(t),..., W(t)” 


when the W*(t) are independent one-dimensional Brownian motions. 
We use boldface below to denote vector-valued functions and stochastic processes. 
7.3 STOCHASTIC DIFFERENTIAL EQUATIONS. 


We discuss next how to understand stochastic differential equations, driven by “white 
noise”. Consider first of all 


X(t) = £(X(t)) + c€(t t>0 
ey ae (O  (t>0) 


where we informally think of € = W. 


DEFINITION. A stochastic process X(-) solves (7.3) if for all times t > 0, we have 


(7.4) =z +f m )) ds + W(t). 


REMARKS. (i) It is possible to solve (7.4) by the method of successive approximation. 
For this, we set X°(-) = x, and inductively define 


xX’! @) := z? + [ f(X"(s))ds +oW(t). 
0 


It turns out that X*(t) converges to a limit X(t) for all t > 0 and X(-) solves the integral 
identities (7.4). 
(ii) Consider a more general SDE 


(7.5) X(t) = £(X(t)) HHX) (E> 0), 


which we formally rewrite to read: 


TEO L XH) + HX) 


dW (t) 
dt 





and then 
dX(t) = f(X(t))dt + H(X(t))dW(t). 


This is an It6 stochastic differential equation. By analogy with the foregoing, we say X(-) 
is a solution, with the intial condition X(0) = 2°, if 


X(t) = 2° + "£(X(s)) ds + “H(X(s)) -dW/(s) 
0 0 


for all times t > 0. In this expression i. H(X(s)) dW(s) is is called an It6 stochastic 
integral. 














REMARK. Given a Brownian motion W(-) it is possible to define the Itô stochastic 


integral 
t 
; Y -dW 
0 


for processes Y(-) having the property that each time 0 < s < t “Y(s) depends on 
W(r) for times 0 < T < s, but not on W(r) for times s < T. Such processes are called 
“nonanticipating” . 

We will not here explain the construction of the Ito integral, but will just record one of 
its useful properties: 


(7.6) B| f vaw] ai 
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7.4 STOCHASTIC CALCULUS, IÔ CHAIN RULE. 


Once the Ito stochastic integral is defined, we have in effect constructed a new calculus, 
the properties of which we should investigate. This section explains that the chain rule in 
the Ito calculus contains additional terms as compared with the usual chain rule. These 
extra stochastic corrections will lead to modifications of the (HJB) equation in §7.5. 


7.4.1 ONE DIMENSION. We suppose that n = 1 and 


(7.7) l dX(t) = A(t)dt + B(t)AW (t) (t20) 


X(0)= x° 


The expression (7.7) means that 


for all times t > 0. 


Let u : R — R and define 





Y(t) := U(X (©). 


We ask: what is the law of motion governing the evolution of Y in time? Or, in other 
words, what is dY (t)? 
It turns out, quite surprisingly, that it is incorrect to calculate 


dY (t) = d(u(X(t)) = u/(X(t))dX(t) = u (XH) (A(t)dt + B(t)AW (t) 


ITO CHAIN RULE. We try again and make use of the heuristic principle that “dW = 
(dt)'/?”. So let us expand u into a Taylor’s series, keeping only terms of order dt or larger. 
Then 


dY (t) = d(u(X(t))) 
= u!(X(t))dX(t) + su" (X(t))aX (t)? ip sul" (X(t))AX(t) ee 


= u'(X(t))[A(t)dt + B(t)dW(t)] + TXOA + B(ÐdW (H? +... 


the last line following from (7.7). Now, formally at least, the heuristic that dW = (dt)!/? 


implies 


[A(t)dt + B(t)dW(t)]? = A(t dt? + 2A(t)B(t)dtdW(t) + B?(t)dW (t)? 


B?(t)dt + o(dt) 
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Thus, ignoring the o(dt) term, we derive the one-dimensional It6 chain rule 


dY (t) = d(u(X(#))) 


Mee) = EOG + FBO) dt + u'(X(t))B(t)dW (t). 


This means that for each time t > 0 


7.4.2 HIGHER DIMENSIONS. We turn now to stochastic differential equations in 
higher dimensions. For simplicity, we consider only the special form 


dX(t) = A(t)dt + odW (t t>0 
(7.9) { os ee ©  (t20) 


We write 
X(t) SOA a) eae KOS. 


The stochastic differential equations means that for each index i, we have dX‘(t) = 
At (t)dt + odWŻ (t). 


ITO CHAIN RULE AGAIN. Let u: R” x [0,co) — R and put 














Y(t) := U(X (t), t). 


What is dY? Similarly to the computation above, we calculate 


= = uz(X y+ +> Un, (X x) 


+= 5 TRN t)dX*(t)dX4(t). 


t,j=1 


Now use (7.9) and the heuristic rules that 


dW? = (dt)? and dW*dW = 
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fc ifi=j 
0 ifi Fz. 


The second rule holds since the components of dW are independent. Plug these identities 
into the calculation above and keep only terms of order dt or larger: 


dY (t) = (X(t), t) + 5 Ux, (X(t), t)[A’(t)dt + odW"(t)] 


(7.10) + FY trees (X), tat 
= (X(t), t) + Veu(X(t), t) - [A (t)dt + odW (t)] 


J T- Au(X(t), tat. 


This is [t6’s chain rule in n-dimensions. Here 
A= —~ 
a 
denotes the Laplacian. 


7.4.3 APPLICATIONS TO PDE. 
A. A stochastic representation formula for harmonic functions. Consider a 











region U C R” and the boundary-value problem 





(7.11) { Au =0 (x € U) 


u=g (x € OU) 


where, as above, A = $] a is the Laplacian. We call u a harmonic function 
We develop a stochastic representation formula for the solution of (7.11). Consider the 
random process X(t) = W(t) + 2; that is, 


dX(t) = dW(t) (t > 0) 
{ X(0) =2 


and W(-) denotes an n-dimensional Brownian motion. To find the link with the PDE 
(7.11), we define Y(t) := u(X(t)). Then It6’s rule (7.10) gives 


dY (t) = Vu(X(t)) - dW (t) + 5 AuX(t))dt. 


Since Au = 0, we have 
dY (t) = Vu(X(t)) -dW (t); 


which means 


u(X(t)) = Y(t) = Y (0) + J Vu(X(s)) - dW (s). 
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Let + denote the (random) first time the sample path hits OU. Then, putting t = T 


above, we have 
u(x) = u(X(r)) - f Vu -dW (s). 
0 
But u(X(7)) = g(X(r)), by definition of r. Next, average over all sample paths: 


u(x) = Elg(X(r))] - E ie Vu. aw] f 


The last term equals zero, according to (7.6). Consequently, 


INTERPRETATION. Consider all the sample paths of the Brownian motion starting 
at x and take the average of g(X(r)). This gives the value of u at z. 














B. A time-dependent problem. We next modify the previous calculation to cover 
the terminal-value problem for the inhomogeneous backwards heat equation: 














(7.11) i ulz, t) + FAu(a,t) = f(a,t) (x ER", 0<t<T) 
u(x, T) = g(a). 











Fixx € R,0<t< T. We introduce the stochastic process 





{ dX(s) = odW (s) (s >t) 
X(t) =a: 


Use It’s chain rule (7.10) to compute du(X(s), s): 


du(X(s),s) = us(X(s), s) ds + Veu(X(s), s) -dX(s) + = Au(X(s), s) ds. 


Now integrate for times t < s < T, to discover 
J o2 
u(X(T),T) =u(X(t),t)+ f z Au(X(s), s) + us(X(s), s) ds 
t 
T 
+f oV,u(X(s), 5) -dW(s). 
t 


Then, since u solves (7.11): 


T 


u(x,t) =E (xa = 
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f(X (s), s) as.) 


t 














This is a stochastic representation formula for the solution u of the PDE (7.11). 


7.5 DYNAMIC PROGRAMMING. 
We now turn our attention to controlled stochastic differential equations, of the form 


(SDE) l dX (s) = f(X (s), A(s))ds +odW (s)  (t<s<T) 


(y= as 
Therefore : 
X(T) = a+ J f(X(s), A(s)) ds + oldW (7) — W(t)] 


for allt <r <T. We introduce as well the expected payoff functional 


T 
(P) Pral AC) = E 1 r(X(s), A(s)) + sxc} . 


The value function is 


v(x, t):= sup PrelA()]. 
A()EA 


We will employ the method of dynamic programming. To do so, we must (i) find a PDE 
satisfied by v, and then (ii) use this PDE to design an optimal control A*(-). 


7.5.1 A PDE FOR THE VALUE FUNCTION. 
Let A(-) be any control, and suppose we use it for times t < s < t+ h, h > 0, and 
thereafter employ an optimal control. Then 


t+h 
(7.12) v(x, t) > et | rokte) Aca dE 


and the inequality in (7.12) becomes an equality if we take A(-) = A*(-), an optimal 
control. 


Now from (7.12) we see for an arbitrary control that 


t+h 
0 =e} | r(X(s), A(s)) ds + v(X(t+ h),t +h) ~via. 


tth 
-2f | pds 4 (KLE At 1) = otat) 
t 
Recall next Itô’s formula: 


du(X (s), s) = v(X (s), s) ds + D Vz, (X(s), s)dXŻ (s) 


(7.13) ee 5 taa s)dX"(s)dX4(s) 
t,j=1 
= v ds + Vzv - (fds + odW (s8)) + on ds. 
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This means that 
t+h o2 
v(X(t+h),t+h)—v(X(t), t) = f (u+ Vavta Tav) ds 
t 
t+h 
+ | oV,v:-dW(s); 
t 

and so we can take expected values, to deduce 


(7.14) Elu(Xt+h),t+h) —v(@,t)] =E 





tth o2 
ip (u+ Vavta Tav) ds|. 
t 
We derive therefore the formula 


t+h o2 
0> E i (r+u+Vi0-£+ Tao) ds|. 
t 





Divide by h: 





2 
vX (s), s) + £(X (s), A(s)) - Vev(X(s), s) + = Av(X(s), s) is 3 
If we send h — 0, recall that X(t) = xz and set A(t) := a € A, we see that 


2 
0 > r(x,a)+vilx,t)+f£f(x,a)- Vzulz, t) + = A(z, t). 


The above identity holds for all x,t,a and is actually an equality for the optimal control. 
Hence 


2 
max fuf- Vav + Tavr} = 
acA 2 


Stochastic Hamilton-Jacobi-Bellman equation. In summary, we have shown that 
the value function v for our stochastic control problem solves this PDE: 


(HJB) v(x, t) + = Av(z,t) + maxaca {f (x,a): Vzulz,t)+r(z,a)}=0 


v(x, T) = g(x). 
This semilinear parabolic PDE is the stochastic Hamilton-Jacobi-Bellman equation. 


Our derivation has been very imprecise: see the references for rigorous derivations. 


7.5.2 DESIGNING AN OPTIMAL CONTROL. 
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Assume now that we can somehow solve the (HJB) equation, and therefore know 
the function v. We can then compute for each point (x,t) a value a € A for which 
Vzvlz,t)-f(x,a)+r(x,a) attains its maximum. In otherwords, for each (x,t) we choose 
a = a(x, t) such that 


max [f(a, a) - Vzu(a, t) + r(x, a) 


occurs for a = a(x, t). Next solve 
{ dX*(s) = f(X*(s), a(X*(s), s)) ds + odW (s) 
X*(t) =z. 


assuming this is possible. Then A*(s) = a(X*(s),s) is an optimal feedback control. 


7.6 APPLICATION: OPTIMAL PORTFOLIO SELECTION. 


Following is an interesting example worked out by Merton [M]. In this model we have the 
option of investing some of our wealth in either a risk-free bond (growing at a fixed rate) 
or a risky stock (changing according to a random differential equation). We also intend to 
consume some of our wealth as time evolves. As time goes on, how can be best (i) allot 
our money among the investment opportunities and (ii) select how much to consume? 


We assume time runs from 0 to a terminal time T. Introduce the variables 


X(t) = wealth at time t (random) 
b(t) = price of a risk-free investment, say a bond 
S(t) = price of risky investment, say a stock (random) 
at (t) = fraction of wealth invested in the stock 
a? (t) = amount of wealth consumed. 
Then 
(7.15) O<a'()<1, O<oe*®<X® (OSt<7T). 


We assume that the value of the bond grows at the known rate r > 0: 
(7.16) db = rbdt; 
whereas the price of the risky stock changes according to 
(7.17) dS = RSdt + oSdW. 
Here r, R,o are constants, with 


R>r>0, rox 0) 
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This means that the average return on the stock is greater than that for the risk-free bond. 
According to (7.16) and (7.17), the total wealth evolves as 


(7.18) dX = (1 — a! (t))Xrdt + a! (t)X(Rdt + odW) — a? (t)dt. 


Let 
Q := {(x,t)|O0O<tŁ<T, x> 0} 


and denote by 7 the (random) first time X(-) leaves Q. Write A(t) = (at (t), a?(t))® for 
the control. 
The payoff functional to be maximized is 


Prel AC) = E (| e °§ F(a?(s)) is) ; 
t 
where F is a given utility function and p > 0 is the discount rate. 

Guided by theory similar to that developed in §7.5, we discover that the corresponding 
(HJB) equation is 





(7.19) w+ _— max 


O<ai<1,a2>0 


l (azo)? 


Ure + ((1 — ay )ar + a£ R — ag)uz + F(a) } = (0; 
with the boundary conditions that 
(7.20) u(0,t) =O, wie) =0. 


We compute the maxima to find 


—(R—1r)ue 17.2 t 
7.21 a = — =, _ F'(a™) = euz, 
(7.21) F(a") 
provided that the constraints 0 < al* < 1 and 0 < a®* < x are valid: we will need to 
worry about this later. If we can find a formula for the value function u, we will then be 
able to use (7.21) to compute optimal controls. 


Finding an explicit solution. To go further, we assume the utility function F has the 
explicit form 
F(a) =a” (0<7y<1). 


Next we guess that our value function has the form 


u(x,t) = g(t)a”, 
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for some function g to be determined. Then (7.21) implies that 


R-r 1 
lx __ 2x _ [opt t)| 7-1 x. 


Plugging our guess for the form of u into (7.19) and setting a, = al*,az = a", we find 


(dO + vrat) + A = VIO) E) 27 = 0 


for the constant 
(R-r)? 
ii 


= 9621-7) 


Now put 
h(t) := (e*g(t)) 7 
to obtain a linear ODE for h. Then we find 


g(t) =e-#t |-—7 (e ae ) 
p—vy 


1-y 














If R-r <o7(1—7), then 0 < a’* < 1 and a”* > 0 as required. 





7.7 REFERENCES 


The lecture notes [E], available online, present a fast but more detailed discussion of 
stochastic differential equations. See also Oskendal’s nice book [O]. 

Good books on stochastic optimal control include Fleming-Rishel [F-R], Fleming-Soner 
[F-S], and Krylov [Kr]. 


111 


APPENDIX: PROOFS OF THE PONTRYAGIN MAXIMUM PRINCIPLE 


A.1 Simple control variations 

A.2 Free endpoint problem, no running payoff 
A.3 Free endpoint problem with running payoffs 
A.4 Multiple control variations 

A.5 Fixed endpoint problem 

A.6 References 


A.1. SIMPLE CONTROL VARIATIONS. 
Recall that the response x(-) to a given control a(-) is the unique solution of the system 
of differential equations: 


(ODE) l oF 7 f (0), a(t)) (t= 0) 


g~ 
We investigate in this section how certain simple changes in the control affect the response. 


DEFINITION. Fix a time s > 0 and a control parameter value a € A. Select € > 0 so 
small that 0 < s — € < s and define then the modified control 


(8.1) (t) [e if s—e<t<8s 
l Q(t) := 
i a(t) otherwise. 


We call a.(-) a simple variation of a(-). 


Let x.(-) be the corresponding response to our system: 


(8.2) ee (t > 0) 


xe(0) = x? 


We want to understand how our choices of s and a cause xe(-) to differ from x(-), for small 
e>0. 


NOTATION. Define the matrix-valued function A : [0,00) — M”"*” by 
A(t) := Vaf (x(t), a(t)). 
In particular, the (i, j)'” entry of the matrix A(t) is 


fo, (x(t), a(t)) (1Si,j <n). 


We first quote a standard perturbation assertion for ordinary differential equations: 
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LEMMA A.1 (CHANGING INITIAL CONDITIONS). Let y-(-) solve the initial- 
value problem: 


l ye(t) = f(y.(t), a(t)) (t = 0) 
y-(0) = 2° + ey? + o(e). 
Then 
y(t) =x(t)+ey(t)+0(e) ase—0, 
uniformly for t in compact subsets of |0, o0), where 
oo (t > 0) 
y(0) =°. 


Returning now to the dynamics (8.2), we establish 


LEMMA A.2 (DYNAMICS AND SIMPLE CONTROL VARIATIONS). We 
have 


(8.3) x-(t) = x(t) +ey(t)+0(e) ase—0, 


uniformly for t in compact subsets of |0, o0), where 


(8.4) y(t)=0 (0<t<s) 
and 

y(t) =AlHy(t) (ts) 
ee. 
for 
(8.6) y? := f(x(s), a) — f(x(s), a(s)). 


when (8.5) holds. 


Proof. Clearly x-(t) = x(t) for 0 < t < s — e. For times s — € < t < s, we have 


E i fl Gea ar eey 
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Thus, in particular, 
x-(s) — x(s) = [f(x(s), a) — f(x(s), a(s))Je + oe). 
On the time interval [s,0oo), x(-) and x.(-) both solve the same ODE, but with differing 
initial conditions given by 
Xe(s) = x(s) + ey” + of), 
for y® defined by (8.5). 


According to Lemma A.1, we have 


x-(t) = x(t) + ey(t) + o(e) (t >s), 














the function y(-) solving (8.5). 


A.2. FREE ENDPOINT PROBLEM, NO RUNNING COST. 
STATEMENT. We return to our usual dynamics 


x(t) =f(x(t),a(t)) (O<t<T) 
ie ome 
and introduce also the terminal payoff functional 
(P) Pla(-)] = g(x(Z)), 


to be maximized. We assume that a*(-) is an optimal control for this problem, corre- 
sponding to the optimal trajectory x*(-). 
We are taking the running payoff r = 0, and hence the control theory Hamiltonian is 
therefore 
H(x,p,a) =f(x,a)-p. 


We must find p* : [0, T] — R”, such that 





(ADJ) p(t) = -V:H (x* (t), p* (t) a*@)) (O<t<T) 
and 
(M) H (x* (t), p* (t), a* (t)) = max H (x* (t), p* (t), a). 


To simplify notation we henceforth drop the superscript * and so write x(-) for x*(-), 
a(-) for a*(-), etc. Introduce the function A(-) = Vzf(x(-),œ(-)) and the control variation 


a,(-), as in the previous section. 














THE COSTATE. We now define p : [0, T] — R to be the unique solution of the terminal- 

value problem 

(8.7) { p(t) = -A7 (t)p(t) (0<t<T) 
p(T) = Vg(x(T)). 

We employ p(-) to help us calculate the variation of the terminal payoff: 
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LEMMA.3 (VARIATION OF TERMINAL PAYOFF). We have 


(8.8) £ Plac(-J)le=o = p(s) - [f(x(s), a) — £(x(s), a(s))]. 


Proof. According to Lemma A.2, 
Plae(-)] = 9(xe(T)) = 9(x(T) + ey(T) + ofe)), 


where y(-) satisfies (8.4), (8.5). We then compute 


a 
de 
On the other hand, (8.5) and (8.7) imply 


(8.9) Plac(:)Jle=0 = Vo(x(Z)) - y(T). 


L ple) -y(t)) = p(t) - y(t) + p(t) - y(t) 
= —A‘(t)p(t)- y(t) + p(t) - A(t)y(t) 
= 0. 
Hence 
Va(x(T)) -y(T) = p(T) -y(T) = p(s) - y(s) = p(s) - y°- 
Since y° = f(x(s), a) — f(x(s), a(s)), this identity and (8.9) imply (8.8). 














We now restore the superscripts * in our notation. 
THEOREM A.4 (PONTRYAGIN MAXIMUM PRINCIPLE). There exists a 
function p* : [0,T] — R” satisfying the adjoint dynamics (ADJ), the maximization prin- 
ciple (M) and the terminal/transversality condition (T). 
Proof. The adjoint dynamics and terminal condition are both in (8.7). To confirm (M), 


fix 0 < s < T and a € A, as above. Since the mapping € + Pla,(-)| for O < e < 1 has a 
maximum at £ = 0, we deduce from Lemma A.3 that 


d x x x x 
0 > 7 Plee()] = p* (s): [Elx (s), a) — £(x"(s), a* (s)]. 


Hence 


H(x* (s), p* (s), a) = f(x* (s), a) : p* (s) 
< f(x* (s), a” (s)) - p“ (s) = H(x* (s), p* (s), a*(s)) 











for each 0 < s < T and a € A. This proves the maximization condition (M). 
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A.3. FREE ENDPOINT PROBLEM WITH RUNNING COSTS 


We next cover the case that the payoff functional includes a running payoff: 


T 
(P) P|a(-)] - | r(x(s),a(s)) ds + g(x(T)). 
The control theory Hamiltonian is now 
H(a,p,a) = f(r,a)-p+r(2,a) 


and we must manufacture a costate function p*(-) satisfying (ADJ), (M) and (T). 


ADDING A NEW VARIABLE. The trick is to introduce another variable and 
thereby convert to the previous case. We consider the function x”*! : [0, T] — R given by 





(8.10) { e+ 1(t) =r(x(t),a(t))  O<t<T) 


x"+1(0) = 0, 





where x(-) solves (ODE). Introduce next the new notation 


7 a 
a AA = ia) 
Tn+1 : 
x(t) fera) 
x)= ( anti ) x(t) | teo) = (Pea) = f” (æa) 
ati (t) r(x,a) 


and 
I(T) := g(@) + n41. 
Then (ODE) and (8.10) produce the dynamics 


(ODE) { a = a(t)) (0<t<T) 


Consequently our control problem transforms into a new problem with no running payoff 
and the terminal payoff functional 


(P) Pla(-)] := 9(x(T)). 





We apply Theorem A.4, to obtain p* : [0,7] — R"*? satisfying (M) for the Hamiltonian 


(8.11) H(,p,a) = f(Z,a) - Ð. 


Also the adjoint equations (ADJ) hold, with the terminal transversality condition 
(T) p(T) = Vax" (T)). 


But f does not depend upon the variable 7,41, and so the (n+ 1)” equation in the adjoint 
equations (ADJ) reads 


per" (t) = -H 


En+1 


= 0. 


Since Jr„,ı = 1, we deduce that 
(8.12) pore eyed: 


As the (n + 1)*" component of the vector function f is r, we then conclude from (8.11) 
that 


H(z,p,a) = f(2,a)-p+r(a,a) = H(z,p,a). 


Therefore A 
p’*(t) 














satisfies (ADJ), (M) for the Hamiltonian H. 


A.4. MULTIPLE CONTROL VARIATIONS. 


To derive the Pontryagin Maximum Principle for the fixed endpoint problem in §A.5 we 
will need to introduce some more complicated control variations, discussed in this section. 


DEFINITION. Let us select times 0 < sı < s2 < sy, positive numbers 0 < Aq,...,An, 
and also control parameters a1, d2,...,an € A. 
We generalize our earlier definition (8.1) by now defining 


a if s, —Ape<t<s i eee kG 

(8.13) a(t) := { k F e ) 
a(t) otherwise, 

for e > 0 taken so small that the intervals [sk — Ape, Sk] do not overlap. This we will call 


a multiple variation of the control a(-). 
Let x,(-) be the corresponding response of our system: 


í Xe(t) = F(xe(t),ac(#))  @20) 


(8.14) on 


NOTATION. (i) As before, A(-) = V.f(x(-),a(-)) and we write 


(8.15) y(t) =Y(t,s)y" (2s) 
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to denote the solution of 














Rae {=O E29 
y(s)=y*, 
where yê € R” is given. 
(ii) Define 
(8.17) yë: := f(x(sk) ak)) — f(x(s~), a(Sx)) 


for k= 1,..., N. 
We next generalize Lemma A.2: 


LEMMA A.5 (MULTIPLE CONTROL VARIATIONS). We have 
(8.18) Xe(t) =x(t)+ey(t)+0(e) ase— 0, 


uniformly for t in compact subsets of |0, o0), where 


y(t) =0 (0<t<s;) 
(8.19) y(t) = oper AK Y(t, sey (8m <t < Sm41) for m=1,...,N-1 
y(t) = ely eV (é,en)y* (su <4). 


DEFINTION. The cone of variations at time t is the set 


N 
(8.20) K(t) := pa ARY (t, 8m) y** 
k=1 


Observe that K(t) is a convex cone in R”, which according to Lemma A.5 consists of all 





O<sy<sgc---< sy <t 





= 2s cel oO, 


changes in the state x(t) (up to order £) we can effect by multiple variations of the control 
a(-). 

We will study the geometry of A(t) in the next section, and for this will require the 
following topological lemma: 


LEMMA A.6 (ZEROES OF A VECTOR FIELD). Let S denote a closed, bounded, 
conver subset of R” and assume p is a point in the interior of S. Suppose 





®:S — R” 











is a continuous vectorfield that satifies the strict inequalities 


(8.21) | (x) — z| < |x — p| for all x € OS. 
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Then there exists a point x E€ S such that 


(8.22) (x) =p. 


Proof. 1. Suppose first that S is the unit ball B(0,1) and p = 0. Squaring (8.21), we 
deduce that 
@®(xz)- x >0 for all z € OB(0, 1). 


Then for small t > 0, the continuous mapping 
W(x) := x — t(x) 


maps B(0,1) into itself, and hence has a fixed point x* according to Brouwer’s Fixed Point 
Theorem. And then ®(x*) = 0. 


2. In the general case, we can always assume after a translation that p = 0. Then 0 
belongs to the interior of S. We next map S onto B(0,1) by radial dilation, and map ® 











by rigid motion. This process converts the problem to the previous situation. 





A.5. FIXED ENDPOINT PROBLEM. 
In this last section we treat the fixed endpoint problem, characterized by the constraint 


(8.23) x(r) = zt, 











where T = T[@(-)] is the first time that x(-) hits the given target point xt € R”. The payoff 





functional is 


(P) Pla(.)] = | selaas: 


ADDING A NEW VARIABLE. As in §A.3 we define the function z”+! : [0,7] > R 


b 
? { ETL) = r(x(t), a(t)) (0<t<r) 


a" **(0) = 0, 


and reintroduce the notation 


Tı x 
2 x : _0 x? : 
Ci= = z ; T = 0 = H ; 
n+l Le oe, 
n+l 0 
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with 


GD) = ty 


The problem is therefore to find controlled dynamics satisfying 


(ODE) l - ae a(t))  (0<t<7) 


Pll 


and maximizing 
(P) g(X(7)) = a **(r), 


T being the first time that x(T) = xt. In other words, the first n components of Z(T) are 
prescribed, and we want to maximize the (n + 1) component. 


We assume that a*(-) is an optimal control for this problem, corresponding to the 
optimal trajectory x*(-); our task is to construct the corresponding costate p*(-), satisfying 
the maximization principle (M). As usual, we drop the superscript x to simplify notation. 


THE CONE OF VARIATIONS. We will employ the notation and theory from the 
previous section, changed only in that we now work with n + 1 variables (as we will be 
reminded by the the overbar on various expressions). 

Our program for building the costate depends upon our taking multiple variations, as 
in §A.5, and understanding the resulting cone of variations at time T: 





N 
= a “sk N =1,2,..., Àk > 0, ax E€ A, 
(8.24) K = K(7) = [axess E E EN \ 
for 
(8.25) J% := £(%(sx), an)) — £(X(sx), a(sx)). 


We are now writing 
(8.26) y(t) = Y(t, s)y° 
for the solution of 


(8.27) 


with A(-) := Vef(X(-), a(-)). 
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LEMMA A.7 (GEOMETRY OF THE CONE OF VARIATIONS). We have 


(8.28) etl ¢ K°, 


Here K? denotes the interior of K and e* = (0,...,1,...,0)", the 1 in the k-th slot. 


Proof. 1. If (8.28) were false, there would then exist n + 1 linearly independent vectors 


zt,...,2"t! € K such that 
n+1 


ert! = X Age” 
k=1 


with positive constants 


Ap > 0 
and 
(8.29) z = Y(T, sp) 9° 
for appropriate times 0 < s1 < S1 < ++: < Sn41 < T and vectors ° = f(X(s,),ax)) — 


f(X(s,),a(s,)), fork =1,...,n+1. 


2. We will next construct a control a,(-), having the multiple variation form (8.13), 
with corresponding response Xe(-) = (x-(-)7, 27*1(-))” satisfying 


(8.30) x.(T) = x! 
and 
(8.31) gore ean), 


This will be a contradiction to the optimality of the control a(-): (8.30) says that the new 
control satisfies the endpoint constraint and (8.31) says it increases the payoff. 


3. Introduce for small 7 > 0 the closed and convex set 





n+1 
s= fa- Sna EEN 
k=1 
Since the vectors z1,...,z”+1 are independent, S has an interior. 
Now define for small € > 0 the mapping 
p: S — RH! 


by setting 
P(x) := X(T) — X(T) 
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for z = A Anz, where ¥,(-) solves (8.14) for the control œe(-) defined by (8.13). 


We assert that if u,7,¢€ > 0 are small enough, then 


for some x € S. To see this, note that 
| (x) — «| = |Xe(T) — X(T) — z| = O(|z]) asx—>0,7ES 


< |z — pl for all x € OS. 














Now apply Lemma A.6. 


EXISTENCE OF THE COSTATE. We now restore the superscripts x and so write 
x*(-) for x(-), ete. 


THEOREM A.8 (PONTRYAGIN MAXIMUM PRINCIPLE). Assuming our prob- 
lem is not abnormal, there exists a function p* : [0,7*] — R” satisfying the adjoint dy- 
namics (ADJ) and the maximization principle (M). 


The proof explains what “abnormal” means in this context. 











Proof. 1. Since én41 ¢ K? according to Lemma A.7, there is a nonzero vector w € R”+t 
such that 





(8.32) w-z<0 for all z € K 
and 
(8.33) wor >Q. 


Let p*(-) solve (ADJ), with the terminal condition 
p(T) =w. 

Then 

(8.34) peti) = wt > 0. 
Fix any time 0 < s < T, any control value a € A, and set 


y= F(x" (s), a) — F(x* (s), a* (s)). 


Now solve 


so that, as in §A.2, 


0>w-y(r)=p'(7)-¥(7) = P'(s) -¥(s) = p* (s): y*. 


Therefore 
P“ (s) - [f(x*(s), a) — f(x*(s), a*(s))] < 0; 
and then 
(8.35) H(x*(s), p*(s), a) = £(%*(s), a) - p*(s) 
l < f(x*(s),a*(s)) -p*(s) = H(x*(s), P* (s), a* (8)), 


for the Hamiltonian 


2. We now must address two situations, according to whether 


(8.36) wt > 0 
or 
(8.37) wt = 0. 


When (8.36) holds, we can divide p*(-) by the absolute value of w”+t and recall (8.34) to 
reduce to the case that 
prti) =1. 


Then, as in §A.3, the maximization formula (8.35) implies 
H(x* (s), p*(s),a) < H(x* (s), p* (s), œ* (s)) 


for 
H(z, p,a) = f (x,a) : p +r(z,a). 
This is the maximization principle (M), as required. 


When (8.37) holds, we have an abnormal problem, as discussed in the Remarks and 
Warning after Theorem 4.4. Those comments explain how to reformulate the Pontryagin 











Maximum Principle for abnormal problems. 





CRITIQUE. (i) The foregoing proofs are not complete, in that we have silently passed 
over certain measurability concerns and also ignored in (8.29) the possibility that some of 


the times są are equal. 
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(ii) We have also not (yet) proved that 
tr H(x*(t),p*(t),a(t)) is constant 


in §A.2 and A.3, and 





in §A.5. 











A.6. REFERENCES. 

We mostly followed Fleming-Rishel [F-R] for §A.1-§A.3 and Macki-Strauss [M-S] for 
8A.4 and §A.5. Another approach is discussed in Craven [Cr]. Hocking [H] has a nice 
heuristic discussion. 
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